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Preface 


This volume contains the Proceedings of the 2nd Stochastic Transport in Upper 
Ocean Dynamics Workshop held on 20-23 September 2021. After the success of the 
first workshop, the STUOD Principal Investigators: Prof. Dan Crisan (ICL), Prof. 
Bertrand Chapron (IFREMER), Prof. Darryl Holm (ICL) and Prof. Etienne Mémin 
(INRIA) were delighted to be back with another educational and inspirational event. 
“Stochastic Transport in Upper Ocean Dynamics” (STUOD) project is supported 
by an ERC Synergy Grant, led by Imperial College London, National Institute 
for Research in Digital Science and Technology (INRIA) and the French Research 
Institute for Exploitation of the Sea (IFREMER). The project aims to deliver new 
capabilities for assessing variability and uncertainty in upper ocean dynamics and 
provide decision makers a means of quantifying the effects of local patterns of sea 
level rise, heat uptake, carbon storage and change of oxygen content and pH in the 
ocean. The project will make use of multimodal data and will enhance the scientific 
understanding of marine debris transport, tracking of oil spills and accumulation of 
plastic in the sea. 

As in the previous year, the 2nd STUOD Annual Workshop 2021 focused on a 
range of fundamental topical areas, including: 


1. Observations at high resolution of upper ocean properties such as temperature, 
salinity, topography, wind, waves and velocity 

2. Large-scale numerical simulations 

3. Data-based stochastic equations for upper ocean dynamics that quantify simula- 
tion error 

4. Stochastic data assimilation to reduce uncertainty 


Each chapter in the present volume illustrates one or several of these topical 
areas. Many chapters offer new mathematical frameworks that are intended to 
enhance future research in the STUOD project. 

The event brought together 65 participants from 11 countries: UK 28, France 22, 
USA 1, Canada 1, Australia 1, Czech Republic 1, Germany 4, Italy 4, Ireland 1, 
South Africa 1 and Switzerland 1. Moreover, the workshop was well attended by 
early-career academics, post-graduate students, industry representatives (Watson- 
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Marlow Fluid Technology Group, OceanScope), senior members of the community 
and invited guests. 

The scientific program of this 4-day hybrid event included invited presentations 
by STUOD Advisory Board Members: Prof Alberto Carrassi (University of Read- 
ing, NCEO), Prof Franco Flandoli (Scuola Normale Superiore) and Prof Sebastian 
Reich (University of Potsdam), Dr Eniko Székely (Ecole Polytechnique Fédérale 
de Lausanne, Swiss Data Science Center), individual presentations by the STUOD 
Principal Investigators and post-doctoral Researchers, snapshot presentations and 
demos. The speakers included leading mid-career and senior researchers as well as 
early-career researchers. Moreover, the forum yielded opportunities for investigators 
at an early stage of their career to have discussions with established scientist, 
fostering potential future research collaborations, networking as well as inclusion 
and training of the next generation of researchers. 


The photograph above shows some participants attending the event in person 
during a break between lectures. 

Most of the lectures were video-recorded and may be viewed on the 
STUOD YouTube channel. 

The following is a brief description of the 19 contributions included in the 
proceedings: 

The submitted manuscripts include the paper by Dan Crisan and Prince 
Romeo Mensah, entitled “Blow-up of Strong Solutions of the Thermal Quasi- 
Geostrophic Equation’. This paper concerns the system of coupled equations that 
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governs the evolution of the buoyancy and potential vorticity of a fluid. This system 
has been shown in recent work of the authors and their collaborators to possess 
a local in time solution. In this paper, the authors give a characterization of the 
blow-up of solutions of the system in the spirit of the classical Beale-Kato—Majda 
blow-up criterion for the solution of the Euler equation. 

The contribution of Arnaud Debussche, Berenger Hug, and Etienne Mémin, 
entitled “Modelling Under Location Uncertainty: A Convergent Large-Scale 
Representation of the Navier-Stokes Equations’, introduces martingale solutions 
for 2D and 3D stochastic Navier-Stokes equations in the framework of the modelling 
under location uncertainty (LU). Such solutions are unique when the spatial 
dimension is 2D. The authors also prove that, if the noise intensity goes to zero, 
these solutions converge to a solution of the deterministic Navier-Stokes equation. 

Evgueni Dinvay considers in the paper “A Stochastic Benjamin-Bona- 
Mahony Type Equation” a particular nonlinear dispersive stochastic equation 
recently introduced as a model describing surface water waves under location 
uncertainty. The corresponding noise term is introduced through a Hamiltonian 
formulation, which guarantees the energy conservation of the flow. The author 
shows that the initial-value problem has a unique solution. 

Benjamin Dufée, Etienne Mémin, and Dan Crisan investigate in the paper 
“Observation-Based Noise Calibration: An Efficient Dynamics for the Ensem- 
ble Kalman Filter” the calibration of the stochastic noise in order to guide its 
realizations towards the observational data used for the assimilation. This is done 
in the context of the stochastic parametrization under location uncertainty (LU) and 
data assimilation. The new methodology is mathematically justified by the use of the 
Girsanov theorem and yields significant improvements in the experiments carried 
out on the surface quasi-geostrophic (SQG) model, when applied to ensemble 
Kalman filters. The test case studied in the paper shows improvements of the peak 
MSE from 85% to 93%. 

The paper by Camilla Fiorini, Pierre-Marie Boulvard, Long Li, and Etienne 
Mémin, entitled “A Two-Step Numerical Scheme in Time for Surface Quasi 
Geostrophic Equations Under Location Uncertainty”, considers the surface 
quasi-geostrophic (SQG) system under location uncertainty (LU) and proposes 
a Milstein-type scheme for these equations, which is then used in a multi-step 
method. The SQG system considered in the paper consists of one stochastic partial 
differential equation, which models the stochastic transport of the buoyancy, and a 
linear operator linking the velocity and the buoyancy. In the LU setting, the Euler- 
Maruyama scheme converges with weak order 1 and strong order 0.5. The authors 
develop higher order schemes in time, based on a Milstein-type scheme in a multi- 
step framework. They compare different kinds of Milstein schemes. The scheme 
with the best performance is then included in the two-step scheme. Finally, they 
show how their two-step scheme decreases the error in comparison to other multi- 
step schemes. 

The contribution of Franco Flandoli and Eliseo Luongo, entitled “The Dissipa- 
tion Properties of Transport Noise”, presents in a compact way the latest results 
about the dissipation properties of transport noise in fluid mechanics. Motivated 
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by the fact that transport noise is natural in a passive scalar equation for the 
heat diffusion and transport, the authors introduce several results about enhanced 
dissipation due to the noise. Rigorous statements are matched with numerical 
experiments to understand that the sufficient conditions stated are not yet optimal 
but give a first useful indication. 

Daniel Goodair presents in the paper “Existence and Uniqueness of Maximal 
Solutions to a 3D Navier-Stokes Equation with Stochastic Lie Transport” a 
criterion for showing that an abstract SPDE possesses a unique maximal strong 
solution. This is then applied to a 3D stochastic Navier-Stokes equation. Inspired 
by the classical work of Kato and Lai, the author provides a comparable result 
in the stochastic case applicable to a variety of noise structures such as additive, 
multiplicative and transport. In particular, the criterion is designed to fit viscous 
fluid dynamics models with stochastic advection by lie transport. Its application to 
the incompressible Navier-Stokes equation matches the existence and uniqueness 
result of the deterministic theory. 

Darryl D. Holm, Ruiao Hu, and Oliver D. Street present in “Coupling of 
Waves to Sea Surface Currents Via Horizontal Density Gradients” a set of 
mathematical models and numerical simulations motivated by satellite observations 
of horizontal sea surface fluid motions that show the close coordination between 
thermal fronts and the vertical motion of waves or, after an approximation, the 
slowly varying envelope of the rapidly oscillating waves. This coordination of fluid 
movements with wave envelopes occurs most dramatically when strong horizontal 
buoyancy gradients are present, e.g., at thermal fronts. The nonlinear models of 
this coordinated movement presented in the paper may provide future opportunities 
for the optimal design of satellite imagery that could simultaneously capture the 
dynamics of both waves and currents directly. The models derived in the paper 
appear first in their un-approximated form, then again with a slowly varying 
envelope (SVE) approximation using the WKB approach. The WKB wave-current- 
buoyancy interaction model derived by the authors for a free surface with horizontal 
buoyancy gradients indicates that the mechanism for these correlations is the 
ponderomotive force of the slowly varying envelope of rapidly oscillating waves 
acting on the surface currents via the horizontal buoyancy gradient. In this model, 
the buoyancy gradient appears explicitly in the WKB wave momentum, which in 
turn generates density-weighted potential vorticity whenever the buoyancy gradient 
is not aligned with the wave-envelope gradient. 

The contribution of Ruiao Hu and Stuart Patching, entitled “Variational 
Stochastic Parameterisations and Their Applications to Primitive Equation 
Models”, presents a numerical investigation into the stochastic parameterizations of 
the primitive equations (PE) using the stochastic advection by lie transport (SALT) 
and stochastic forcing by lie transport (SFLT) frameworks. These frameworks were 
chosen due to their structure-preserving introduction of stochasticity, which decom- 
poses the transport velocity and fluid momentum into their drift and stochastic 
parts, respectively. In this paper, the authors develop a new calibration methodology 
to implement the momentum decomposition of SFLT, and they compare this 
methodology with the Lagrangian path methodology implemented for SALT. The 
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resulting stochastic primitive equations are then integrated numerically using a 
modification of the FESOM?2 code. For certain choices of the stochastic parameters, 
the authors show that SALT causes an increase in the eddy kinetic energy field 
and an improvement in the spatial spectrum. SFLT also shows improvements in 
these areas, though to a lesser extent. The SALT approach, however, produces 
an excessive downwards diffusion of temperature, compared to high-resolution 
deterministic simulations. 

The paper by Oana Lang and Wei Pan, entitled “A Pathwise Parameterisation 
for Stochastic Transport’, sets the stage for a new probabilistic approach to 
effectively calibrate in a pathwise manner a general class of stochastic nonlinear 
fluid dynamics models. The authors focus on a 2D Euler SALT equation, showing 
that the driving stochastic parameter can be calibrated in an optimal way to match a 
set of given data. Moreover, they show that this model is robust with respect to the 
stochastic parameters. 

The work by Long Li, Etienne Mémin, and Gilles Tissot, entitled “Stochastic 
Parameterization with Dynamic Mode Decomposition”, considers a physical 
stochastic parameterization to account for the effects of the unresolved small scale 
on the large-scale flow dynamics. This random model is based on a stochastic 
transport principle, which ensures a strong energy conservation. The dynamic 
mode decomposition (DMD) is performed on high-resolution data to learn a basis 
of the unresolved velocity field, on which the stochastic transport velocity is 
expressed. Time-harmonic property of DMD modes allows the authors to perform 
a clean separation between time-differentiable and time-decorrelated components. 
The corresponding random scheme is assessed on a quasi-geostrophic (QG) model. 

The paper by Alexander Lobbe, entitled “Deep Learning for the Benes Filter’, 
concerns the filtering problem, in other words, the optimal estimation of a hidden 
state given partial and noisy observations. Filtering is extensively studied in the 
theoretical and applied mathematical literature. One of the central challenges in 
filtering today is the numerical approximation of the optimal filter. The author 
presents a brief study of a new numerical method based on the mesh-free neural 
network representation of the density of the solution of the filtering problem 
achieved by deep learning. Based on the classical SPDE splitting method, the 
algorithm introduced includes a recursive normalization procedure to recover the 
normalized conditional distribution of the signal process. The present work uses the 
Benes model as a benchmark: within the analytically tractable setting of the Benes 
filter, the author discusses the role of nonlinearity in the filtering model equations 
for the choice of the domain of the neural network. Further, he presents the first 
study of the neural network method with an adaptive domain for the Benes model. 

Data assimilation techniques are the state-of-the-art approaches in the recon- 
struction of a spatio-temporal geophysical state such as the atmosphere or the ocean. 
These methods rely on a numerical model that fills the spatial and temporal gaps 
in the observational network. Unfortunately, limitations regarding the uncertainty 
of the state estimate may arise when considering the restriction of the data 
assimilation problems to a small subset of observations, as encountered for instance 
in ocean surface reconstruction. These limitations motivated the exploration of 
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reconstruction techniques that do not rely on numerical models. In this context, the 
increasing availability of geophysical observations and model simulations motivates 
the exploitation of machine learning tools to tackle the reconstruction of ocean 
surface variables. In the paper “End-to-End Kalman Filter in a High Dimen- 
sional Linear Embedding of the Observations”, by Said Ouala, Pierre Tandeo, 
Bertrand Chapron, Fabrice Collard and Ronan Fablet, the authors formulate sea 
surface spatio-temporal reconstruction problems as state space Bayesian smoothing 
problems with unknown augmented linear dynamics. The solution of the smoothing 
problem, given by the Kalman smoother, is written in a differentiable framework 
which allows, given some training data, to optimize the parameters of the state space 
model. 

Large-scale weather can often be successfully described using a small amount 
of patterns. A statistical description of re-analysed pressure fields identifies these 
recurring patterns with clusters in state space, also called regimes. Recently, these 
weather regimes have been described through instantaneous, local indicators of 
dimension and persistence, borrowed from dynamical systems theory and extreme 
value theory. Using similar indicators and going further, Paul Platzer, Bertrand 
Chapron, and Pierre Tandeo focus in the paper “Dynamical Properties of 
Weather Regime Transitions” on weather regime transitions. They use sixty years 
of winter-time sea-level pressure reanalysis data centred on the North-Atlantic 
Ocean and western Europe. These experiments reveal regime-dependent behaviours 
of dimension and persistence near transitions, although in average one observes an 
increase of dimension and a decrease of persistence near transitions. The effect of 
transition on persistence is stronger and lasts longer than on dimension. The findings 
confirm the relevance of such dynamical indicators for the study of large-scale 
weather regimes and reveal their potential to be used for both the understanding 
and detection of weather regime transitions. 

Standard maximum likelihood or Bayesian approaches to parameter estimation 
for stochastic differential equations are known not to be robust to perturbations in the 
continuous-in-time data. In the paper “Frequentist Perspective on Robust Param- 
eter Estimation Using the Ensemble Kalman Filter”, Sebastian Reich gives 
a rather elementary explanation of this observation in the context of continuous- 
time parameter estimation using an ensemble Kalman filter. The author employs 
the frequentist perspective to shed new light on two robust estimation techniques; 
namely subsampling the data and rough path corrections. He also illustrates the 
findings through a simple numerical experiment. 

The contribution of Valentin Resseguier, Erwan Hascoet and Bertrand 
Chapron, entitled “Random Ocean Swell-Rays: A Stochastic Framework’, 
concerns swell systems that radiate across ocean basins. Far from their sources, 
emerging surface waves have low steepness characteristics, with very slow 
amplitude variations. Swell propagation then closely follows principles of 
geometrical optics, that is, the eikonal approximation to the wave equation, with a 
constant wave period along geodesics, when following a wave packet at its group 
speed. The phase averaged evolution of quasi-linear wave fields is then dominated 
by interactions with underlying current and/or topography changes. Comparable 
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to the propagation of light in a slowly varying medium, over many wavelengths, 
cumulative effects can lead to refraction. This opens the possibility of using surface 
swell waves as probes to estimate turbulence along their propagating path. 

Louis Thiry, Long Li and Etienne Mémin present in the paper, entitled 
“Modified (Hyper-) Viscosity for Coarse-Resolution Ocean Models”, a simple 
parameterization for coarse-resolution ocean models. To replace computationally 
expensive high-resolution ocean models, the authors develop a computationally 
cheap parameterization for coarse-resolution models based solely on the modifi- 
cation of the viscosity term in advection equations. The parametrization is meant 
to reproduce the mean quantities like pressure, velocity or vorticity computed 
from a high-resolution reference solution or using observations. The authors 
test this new parameterization on a double-gyre quasi-geostrophic model in the 
eddy-permitting regime. The results show that the proposed scheme significantly 
improves the energy statistics and the intrinsic variability on the coarse mesh. This 
method will serve as a deterministic basis model for coarse-resolution stochastic 
parameterizations in future works. 

Resolving numerically all the scale interactions of ocean dynamics in a high- 
resolution realistic configuration is today far beyond reach, and only large-scale 
representations can be afforded. Francesco L. Tucciarone, Etienne Mémin and 
Long Li study in the paper “Primitive Equations Under Location Uncertainty: 
Analytical Description and Model Development” a stochastic parameterization 
of the ocean primitive equations derived within the modelling under location 
uncertainty framework. Numerical assessments built with the NEMO core’s code 
are provided for a double-gyres configuration. 

The paper by Yicun Zhen, Bertrand Chapron and Etienne Mémin, enti- 
tled “Bridging Koopman Operator and Time-Series Auto-Correlation Based 
Hilbert-Schmidt Operator’, considers Hilbert-Schmidt operators associated with 
stationary continuous-time processes. A Hilbert space and a (time-shift) continuous 
one-parameter semigroup of isometries are introduced and analysed. Under some 
technical assumptions, the continuous one-parameter semigroup is shown to be 
equivalent, almost surely, to the classical Koopman one-parameter semigroup. 

Finally, the STUOD Organizing Committee would like to acknowledge the 
financial and in-kind support received from several sources: the European Research 
Council (ERC) under the European Union’s Horizon 2020 Research and Innovation 
Programme (ERC, Grant Agreement No 856408) — for providing funds to cover the 
travel expenses of the invited speakers, catering costs and administrative support; 
Imperial College London — for offering the conference venue. 

STUOD Organizing Committee: 

Prof. Bertrand Chapron (IFREMER) 

Prof. Dan Crisan (ICL) 

Prof. Darryl Holm (ICL) 

Prof. Etienne Mémin (INRIA) 

Dr Anna Radomska (ICL) 

May 2022 
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Blow-Up of Strong Solutions of the A 
Thermal Quasi-Geostrophic Equation ce | 


Dan Crisan and Prince Romeo Mensah 


Abstract The Thermal Quasi-Geostrophic (TQG) equation is a coupled system 
of equations that governs the evolution of the buoyancy and the potential vorticity 
of a fluid. It has a local in time solution as proved in Crisan et al. (Theoreti- 
cal and computational analysis of the thermal quasi-geostrophic model. Preprint 
arXiv:2106.14850, 2021). In this paper, we give a criterion for the blow-up of 
solutions to the Thermal Quasi-Geostrophic equation, in the spirit of the classical 
Beale—Kato—Majda blow-up criterion (cf. Beale et al., Comm. Math. Phys. 94(1), 
61-66, 1984) for the solution of the Euler equation. 


Keywords Blow-up criterion - Thermal Quasi-Qeostrophic equation - Modified 
Helmholtz operator 


1 Introduction 


The Thermal Quasi-Geostrophic (TQG) equation is a coupled system of equations 
governed by the evolution of the buoyancy b: (t,x) € [0, T] x R? + b(t,x) eR 
and the potential vorticity q : (t,x) € [0, T] x R? > q(t, x) € R in the following 
way: 


ab+(u-V)b=0, (1) 
aq + (u- V)(q — b) = — (u; ; V)b, (2) 
b(O, x) = bo (x), q (0, x) = qo (x), (3) 


D. Crisan - P. R. Mensah (4) 
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2 D. Crisan and P. R. Mensah 
where 


L I it 
u=V y, ur = 5V h, q=(4-1)y4 + f. (4) 
Here, y : (t,x) € [0,7] x R? > y(t,x) € R is the streamfunction, h : x € 
R? +> A(x) € R is the spatial variation around a constant bathymetry profile and 
f :x € R? f(x) eR is the Coriolis parameter. Since we are working on the 
whole space, we can supplement our system with the far-field condition 


i we (b(x), u(x)) = 0. 


Our given set of data is (uy, f, bo, go) with regularity class: 


urce Wa R? R?), few? R3, bp € W27(R?), qo € W? R3. 
(5) 
The TQG equation models the dynamics of a submesoscale geophysical fluid in 
thermal geostrophic balance, for which the Rossby number, the Froude number and 
the stratification parameter are all of the same asymptotic order. For a historical 
overview, modelling and other issues pertaining to the TQG equation, we refer the 
reader to [4]. 

In the following, we are interested in strong solutions of the system (1)-(4) 
which can naturally be defined in terms of just b and q although the unknowns 
in the evolutionary Eqs. (1)-(2) are b, q and u. This is because for a given f, one 
can recover the velocity u from the vorticity q by solving the equation 


u=V1(A-1)1@-f) 


derived from (4). Also note that a consequence of the equation u = Viw in (4) is 
that divu = 0. This means that the fluid is incompressible. With these information 
in hand, we now make precise, the notion of a strong solution. 


Definition 1 (Local Strong Solution) Let (uy, f, bo, go) be of regularity class (5). 
For some T > 0, we call the triple (b, q, T) a strong solution to the system (1)-(4) 
if the following holds: 


— The buoyancy b satisfies b € C([0, T]; W>-? (R?)) and the equation 


t 
b(t) = bo -f div(bu) dr, 
0 


holds for all t € [0, T]; 
— the potential vorticity q satisfies q € C((0, T]; Ww? (R?)) and the equation 
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t 
at) = qo — Í [aiva — b)u) + div(buy) | dr 


holds for all t € [0, T]. 


Such local strong solutions exist on a maximal time interval. We define this as 
follows. 


Definition 2 (Maximal Solution) Let (uy, f, bo, go) be of regularity class (5). For 
some T > 0, we call (b, q, Tmax) a maximal solution to the system (1)-(4) if: 


— there exists an increasing sequence of time steps (Tn)nen whose limit is Tmax € 
(0, co]; 

— foreachn € N, the triple (b, q, Tn) is a local strong solution to the system (1)-(4) 
with initial condition (bo, go); 

— if Tmax < 00, then 


lim sup ||b(Tn)Iliy3.2qg2y + la Ta lliy2.2¢q2) = - (6) 


Tn Tmax 


We shall call Tmax > 0 the maximal time. 


The existence of a unique local strong solution of (1)—(4) has recently been shown in 
[4, Theorem 2.10] on the torus. A unique maximal solutions also exist [4, Theorem 
2.14] and the result also applies to the whole space [4, Remark 2.1]. We state the 
result here for completeness. 


Theorem 1 For (uy, f, bo, qo) of regularity class (5), there exist a unique maximal 
solution (b, q, T) of the system (1)-(A). 


Before we state our main result, let us first present some notations used throughout 
this work. 


1.1 Notations 


In the following, we write F < G if there exists a generic constant c > 0 (that may 
vary from line to line) such that F < c G. Functions mapping into R? are boldfaced 
(for example the velocity u) while those mapping into R are not (for example the 
buoyancy b and vorticity q). For k € N U {0} and p e€ [1, ov], WP (R?) is the 
usual Sobolev space of functions mapping into R with a natural modification for 
functions mapping into R?. For p = 2, W?(R?) is a Hilbert space with inner 
product (u, v) wk.2(R2) = D pi<x(0%u , fv), where (-, ) denotes the standard L?- 
inner product. For general s € R, we use the norm 


. 2 
lvllws2R3) = ( [ e) rey) (7) 
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defined in frequency space. Here, U(E) denotes the Fourier coefficients of v. For 


simplicity, we write || - |ls,2 for || - |lws.2qqzy. When k = s = 0, we get the usual 

L? (R?) space whose norm we will simply denote by || - ||2. A similar notation will 

be used for norms || - ||» of general LP (IR?) spaces for any p € [1, oo] as well as 
k,p 


for the inner product (-, -}x,2 = (+, +) wk2qa2) when k € N. Additionally, Wgh R?) 
represents the space of divergence-free vector-valued functions in W*%? (R?). 

With respect to differential operators, we let Vo := (0x,, Ox, 0)? and vě = 
(—9,,, 0x,, 0) be the three-dimensional extensions of the two-dimensional differ- 
ential operators V = (0,,, a and Vt := (—9,,, dx, ) by zero respectively. The 
Laplacian A = divV = 0y,x, + 0x)x, remains two-dimensional. 


1.2 Main Result 


Our main result is to give a blow-up criterion, of Beale-—Kato—Majda-type [2], for a 
strong solution (b, q, T) of (1)-(4). In particular, we show the following result. 


Theorem 2 Suppose that (b, q, T) is a local strong solution of (1)-(4). If 


T 
f, (101e + VOl) dt = K < o, 8) 


then there exists a solution (b’, q', T") with T' > T, such that (b', q’) = (b, q) on 
[0, T]. Moreover, for allt € [0, T], 


a 


lOOlls,2 + la@ll2.2 < [e + llbolls,2 + llgoll2,2 exp[cT exp(cK)]. 


An immediate consequence of the above theorem is the following: 


Corollary 1 Assume that (b, q, T) is a maximal solution. If T < œ, then 


T 
[ (Ole + IVO) at = 00 
0 
and in particular, 


sup (la Ollo + IYO) = 0. 
tT 


2 Blow-Up 


We devote the entirety of this section to the proof of Theorem 2. In order to 
achieve our goal, we first derive a suitable exact solution for what is referred to 
as the modified Helmholtz equation. Some authors also call it the Screened Poisson 
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equation [3] while others rather mistakenly call it the Helmholtz equation. Refer 
to [1] for the difference between the Helmholtz equation and modified Helmholtz 
equation. 


2.1 Estimate for the 2D Modified Helmholtz Equation or the 
Screened Poisson Equation 


In the following, we want to find an exact solution  : R? —> R of 


(A — Dw) = wQ), ee =0 (9) 


for a given function w € W? (R?) . The corresponding two-dimensional free space 
Green’s function G° (x) for (9) must therefore solve 


(A — 1G" (x — y) = 8(x—y), lim G'°(x — y)(x) = 0 (10) 


|x| o0 


in the sense of distributions. Indeed, one can verify that the Green’s function is given 
by 


1 
G(x — y) = — Koll — yl) (11) 
see [1, Table 9.5], where 
00 e- [2472 
—— dr 
Vz? +r? 


is the modified Bessel function of the second kind, see equation (8.432-9), page 917 
of [5] with v = 0 and x = 1. However, since the integral above is an even function, 
it follows that 


Ko(z) = 


. id ig ee 
G™] (x -— y) = -H (ilx—yl) = dr (12) 
4 x= y +r? 


which is the zeroth-order Hankel function of the first kind, see equation (11.117) in 
[1] and equation (8.421-9) of [5] on page 915. Therefore, 


l e-1&-y.—r)| 
j= / w((y, 0)) dydr (13) 
da Ins (YI 


=: w(x, 0)) (14) 
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where we have used the identity y |x — y|? + r? = |(x, 0) — (y, D| = |(k—y, —r)|. 
We can therefore view the argument of the streamfunction w as a 3D-vector with 
zero vertical component. 


2.2 Log-Sobolev Estimate for Velocity Gradient 


Our goal now is to find a suitable estimate for the Lipschitz norm of u that solves 
u=Vy, (A-Dy=w (15) 


where w € W*?(R?) is given. In particular, inspired by Beale et al. [2], we aim 
to show Proposition 1 below. This log-estimate is the crucial ingredient that allow 
us to obtain our blow-up criterion in terms of just the buoyancy gradient and the 
vorticity although preliminary estimate may have suggested estimating the velocity 
gradient as well. 


Proposition 1 For a given w € W*?(R?), any u solving (15) satisfies 
lulli, S 1+ + 21n* (|w]l2,2))|lwlloo (16) 


where In* a = In a ifa > 1 and lnt a = 0 otherwise. 


Proof To show (16), we fix L € (0, 1] and for z € RÌ, we let ¢z (z) be a smooth 
cut-off function satisfying 


1: |z|)<L, 
0 :|z| >2L 


CL (Z) = 


and |d¢,(z)| < LT! where 3 := Ve or Vo as well as IVo V$ oz (2)| < L~?. This 
latter requirement ensures that the point of inflection of the graph of the cut-off, the 
portion that is constant, concave upwards and concave downwards are all captured. 
We now define the following 


By := {(y,r) € R? : |(x, 0) — (y,r)| = |(k—y, —r)| < 2L}, 
By := {(y,r) ER? : L<|(«-y,-r)| < I}, 


{(y,r) ER : |(-y,—-r)| > 1}, 


B3: 


so that by adding and subtracting ¢z, we obtain 


|Vu(x)| = |VoVo y (x, 0))| < [Vou ((x, 0)), 0)| + |Vo(w5 ((x, 0)), 0)| 
+ |Vo(u3 (x, 0)), 0)| + 1Vo(u3 ((x, 0)), 0)| 
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+ |Vo(us((x, 0)), 0)| + |Vo(us((x, 0)), 0)| 
=: |Vouy| + |Vous| + |Vou3| + |Vous| 


+ |Vou3| + |Vous| 


where 
i ely 
wens i; t(x -= y, —)) VoVew((y, 0)) dydr, 
4r Bı (x—y, —r)| 
—|(x-y,—r)| 
Vou! = 4 I [tay Dvi i [wo 0)) dydr, 
A -y =] 
—|(-y,—r)| 
vou} = = | vove [1 = iE- y, =DE ty, O) dyar, 
i de ie —y, D] 
—|(x-y,—r) 
Vou} := — f Vell —cL(x-y, -do : Jew. 0)) dydr, 
4r Bo (x —y, —r) 
—|(x-y,—r) 
Vou$ := f Voll — tne -y. -v <—— | w((y, 0)) dydr, 
4r Bo (x = NV. —r) 
—|(x-y,—r) 
Vous := — f vove| [1 —en(x-y. -E w((y, 0)) dydr. 
4x Jp, ° (x—y,—r) 


For L € (0, 1], we have that 


5 e721x,0)—y;r)| 7 r 
|Vou] S (J rr) lVoVo wy, 0))Il2 
B, 1,0) — (NP : 


1 
2L e=3s N2 Suge : 
PS z s*ds) ||wll22< (l-e"")? lwl2,2 S mL? | wll2,2. 
0 


Now note that 


2x-y)(x-y)+ 3K-y)?(xK-y)+ 


is -f 
a a TOE a eas 
0 10 0 10 


1 1 
Sa 00 ar | -100 
y 000) (x-ylr+r*)? \o 00 


a- yT a- y) (x—y)?(x-y)t 
(x-y +r»? (x-y? +r?) 


Ea w(y) drdy 


6 
=: $ K: (@ - y, —r)). 


i=l 
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Clearly, |x — y|? < |x — y|? +r? = |(x—y,—r)|? and for any L € (0, 1], the 


inequalities 


(e# — e7!) < (1 — e7!) < (1 — e™!)(1 — In(L)) S$ (1 — In(D)) 


holds independent of L. Therefore, for L € (0, 1], it follows that 


IK (& = y, =r))| + IKs (& — y, =r)) + IK6(& — y, —r))| 


e7 l&y.-r)| 
S llwll / = a drdy 
© Je I@ -= y, =r)? 
1 „=s 
e 
< |lw o f ; s? ds 
L Ss 


S Ilwlloo( — In(Z)). 


Again, we can use |x — y|? < |x — y|? + r? and the fact that the inequalities 


(eTE(L + 1) — 2e7!) < (1 S2e \< (1 — 2e DO — In(L)) S (1 — In(L)) 


holds independent of any L € (0, 1] to obtain 


IKs((x—y, —r)) S lole f) 


eo l@-y,-r) 


— drd 
yn S 


1 es r 
< lwl f 2i 
L S 


S llwllo(1 — In(L)). 


Finally, for K2 and K4, we also obtain 


e-l&-y.—r)| 
[Ko((x — y, =r) + [Ka((x — y, =r) $ lwll [ SED Sa Lae 


We have shown that 


zdr 
2 Seah fee) 


1 -=s 
e 
BS 3 s? ds 
L Ss 


A 
€ 


LA 
€ 

8 

a 

G 
— 
| = 
È 


2A 
€ 
2 
na 
= 
| 
E 
© 


IVou}] < Ilwllo(1 — In(L)) (17) 
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for L € (0, 1]. Also, the quantity (1/L7)[e~"(L + 1) — e~74(2L + 1)] is uniformly 
bounded for any L € (0, 1] and as such, 


j 2L es ; 
|Vou3| S$ (J. J2’ as) lwll S llwlloo- (18) 


Next, we note that the estimate for Vous and Vous will be the same where in 
particular, 


a al E a-y" 
Vou; = An 3 Vo [1 CL(UX y, Df |x _ yl? + r2 


a-y" 
(x-y +72)3 
=: Ky((x— y, —r)) + Ks (& — y, —7)). 


ja w(y) drdy 


Since |x — y| < | holds on Bg, it follows from the condition Voor (z)| < L7! that 


2L p—s 
€ 
IK7(x-y, —r))| $ (J 97) iwl $ lwll 
L 


since (1/ L)[e~" — e~ 7“ ] is uniformly bounded in L. Similarly, we can use the fact 


that |x — y| < y |x — y|? + r? to obtain 


2L es 4 
IKs(& —y, =r) S (/ a7 8 ds) Iw $ lwll. 
L s4L 
We can therefore conclude that, 
|Vou3| + |Vou3] < lwlloo- (19) 
Similar to the estimate for Vu, we have that 
|Vous| < ||wlloo- (20) 
It follows by summing up the various estimates above that 


1 
[Vulloo S L? lwla, + 0 — In(L))|lwhloo. (21) 


It remains to show that the estimate (21) also holds for u. For this, we first recall 
that 


l eer 7 
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We now use the inequalities 


1 
x= y| < (x= yÊ +77)? (23) 

and 

1 i eo l(&-¥.-1)| z [ Ix —y| |x —y| je 

4r| Eyl ~ Lax —yj24r2)2 Byer? 
to obtain 

lullo $ lwli J | a+ , A 

= R Lix- y? +r? (x —yl2 +72)? 


O eS 2 o eS 2 
s lwl f oss + lwl f s?ds 
0 S 0 S 


S Iwlloo. 
(24) 
Therefore, it follows from (21) and (24) that 
1 
lulli, S L? llwll2,2 + (1 — In(L))||wlloo. (25) 
If ||w|l2,2 < 1, we choose L = 1 and if ||w]|2,2 > 1, we take L = wil so that 


(16) holds. This finishes the proof. 


Before we end the subsection, we also note that a direct computation using the 
definition of Sobolev norms in frequency space (7) immediately yield 


lulk+1,2 S Wwlle2 (26) 


for any k € N U {0} where w € W®? (R?) is a given function in (15). 


2.3 A Priori Estimate 


In order to prove Theorem 2, we first need some preliminary estimates for (b, q). In 
the following, we define 


l, o)l = lhll3,2 + llgll2,2- 
Lemma 1 A strong solution of (1)-(4) satisfies the bound 


d 
qe DI? S (1+ lulli, + IIVBlloo + lgl) (1 + I, pi’). (27) 
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Proof Since the space of smooth functions is dense in the space W°? (R?) x 
W”? (IR?) of existence, in the following, we work with a smooth solution pair (b, q). 
To achieve our desired estimate, we apply 0° to (1) for |B| < 3 to obtain 


0,0°b +u- Vaeb = Rı (28) 
where 
R; := u- ðf Vb — 3f (u - Vb). 
Now since divu = 0, if we multiply (28) by fb and integrate over space, the 
second term on the left-hand side of (28) vanishes after integration by parts. On 
the other hand, we can use the commutator estimate (see for instant [4, Sect. 2.2]) to 


estimate the residual term R4. Consequently, by multiplying (28) by 0°, integrating 
over space, and summing over the multiindices £ so that |6| < 3, we obtain 


d 2 
— 53.2 S (IVullollblls.2 F I Vbl llulls,2)libll3.2 
dt (29) 
S (Vulloo + VP lloo) L + IG, 117) 

where we have used (26) for w = q — f and k = 2. 
Next, we find a bound for lIgll5 z- For this, we apply 3° to (2) for |B| < 2 and we 
obtain 

0,08¢ +u- Və (q — b) + un - VO%8b = Ro + R3 + R4 (30) 


where 


Ro := u - ôf Vq — 3f (U - Vq), 
R3 := —u- 3f Vb + 3f (u - Vb), 
R4 := up, - 3f Vb — dF (uy, - Vb). 


Now notice that for U := Vu, it follows from interpolation that 
E 
IVUll4 S WUllooll V Ull 
and so, 
2 Said 
|Voulla S | Vulloollulls.»- 
Similarly 


oe 
IVall4 < llallcollalls,>- 
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Therefore, 
IVa llallV7ulls S lqlloollulls,2 + [Vulloollgll2,2- 
By using this estimate, we deduce from (26) and commutator estimates that 
Rollo S IV ulloolallz.2 + lgl + lql2,2)- (31) 
The commutators R3 and R4 are easy to estimate and are given by 


IR3ll2 SW Vulloollblls,2 + IVP llooC + Ilall2,2), (32) 
I|Rall2 S llbll3,2 + VOlloo, (33) 


respectively, for a given un € W*?(R?; R7). Next, by using divu = 0, we obtain 
((u- Va"q), a’q) =0. (34) 
Additionally, the following estimates holds true 
(ca vab), 8% q)| $ Italloollb 3.9 + lulila 3.2 (35) 


(can Va%D), 94) S WNB, + lala (36) 


since un € W*?(R?; R?). If we now collect the estimates above (keeping in mind 
that f € W*?(IR*) and u, € W??(R?; R)), we obtain by multiplying (2) by fq 
and then summing over |8| < 2, the following 


d 
lala S (1+ lalli + VO lloo + llalloo) (1 + 1. DI). G7) 


Summing up (29) and (37) yields the desired result. 


We now have all in hand to prove our main theorem, Theorem 2. 
Proof of Theorem 2 In the following, we define the time-dependent function g as 


g(t) :=e+ |b, gOIl, for te [0, T]. (38) 


Next, without loss of generality, we assume that f = 0 so that from Proposition 1, 
we obtain 


luO lli S1+ 0 +l OlM + la@ loo) (39) 
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for t € [0, T]. Using the monotonic properties of logarithms, it follows from the 
above that 


laO lli S 1+ Infg@IC|VEO|loo + lla Ollo). (40) 


Furthermore, since 1 < In(e+|x|) for any x € R, we can deduce from the inequality 
above that 


laH llao FIVE) loo + la Olo S 1+ Inf[g(ICVE@ |loo + lg Oloo). 
(41) 
On the other hand, it follows from Lemma | that 


t 
g(t) < g(O)exp g (1 + [lu(s)Il1,00 + IIYbCs)llo + llig (5) loo) as) (42) 


for any t € [0, T]. Combining (41) and (42) yields 


t 
g(t) < g (0) exp (ef (1 + mig OVE) loo + Ila )Iloo)) a); (43) 


We can now take logarithm of both sides and apply Grénwall’s lemma to the 
resulting inequality to obtain 


t 
In{g(t)] < (In[g(0)] + cT) exp (ef AVEC) + Il (s) loo) as). (44) 


At this, point, we can now utilize (8), take exponentials in (44) and obtain 
I, DOI < Lg IPP explcT exp(cK)] (45) 


for any t € [0, T]. Since the right-hand side is finite, it follows that the solution 
(b, q) can be continued on some interval [0, T’) for some T’ > T . This finishes the 
proof. 
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Modeling Under Location Uncertainty: N 
A Convergent Large-Scale ce | 
Representation of the Navier-Stokes 

Equations 


Arnaud Debussche, Berenger Hug, and Etienne Mémin 


Abstract We construct martingale solutions for the stochastic Navier-Stokes equa- 
tions in the framework of the modelling under location uncertainty (LU). These 
solutions are pathwise and unique when the spatial dimension is 2D. We then prove 
that if the noise intensity goes to zero, these solutions converge, up to a subsequence 
in dimension 3, to a solution of the deterministic Navier-Stokes equation. This 
warrants that the LU Navier-Stokes equations can be interpreted as a large-scale 
model of the deterministic Navier-Stokes equation. 


1 Introduction 


For several years there has been a burst of activity to devise stochastic representa- 
tions of fluid flow dynamics. These models are strongly motivated in particular by 
climate and weather forecasting issues and the need to provide accurate ensemble 
of large-scale flow realisations [2]. Yet, elaborating such stochastic dynamics on 
ad hoc grounds can be highly detrimental to the system of interest [4]. A minimal 
mathematical requirement for satisfactory large-scale flow dynamics representation 
is that a weak solution of the Large Eddy Simulation (LES) scheme converges 
toward a weak solution of the fine-scale deterministic Navier-Stokes equations 
in 3D and toward the unique solution for the 2D Navier-Stokes equations. The 
convergence of some classical LES models toward the true fine scale dynamics is 
well known in the deterministic case [3, 7]. However, the question of convergence 
of stochastic parametrization toward solutions of the deterministic equations at the 
limit of vanishing noise is not always clear. 
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In this study we show that stochastic Navier-Stokes models defined within the 
modelling under location uncertainty principle (LU) [9] have martingale solutions 
in 3D and a unique strong solution—in the probabilistic sense—in 2D. Moreover, 
in 3D in the limit of vanishing noise there exists a subsequence converging in law 
toward a weak solution of the deterministic Navier-Stokes equations and in 2D the 
whole sequence converges toward the unique solution. As such these results enable 
to consider the LU representation as a valid large-scale stochastic representation of 
flow dynamics that is more amenable to ensemble forecasting and data assimilation 
than deterministic model due to an improved variability. 


2 Modelling Under Location Uncertainty 


The LU formulation relies mainly on the following time-scale separation assump- 
tion of the flow: 


dx, = u(X;,t) dt + o(X;, t) dW, (1) 


where X : Rt x 2 — S is the Lagrangian displacement defined within the 
bounded domain S C Rf (d = 2 or 3) with smooth boundary, and u : Rt x 
S x §2 — S denotes the large-scale velocity that is both spatially and temporally 
correlated, while odW is a highly oscillating unresolved component (also called 
noise term) that is only correlated in space. 

More precisely, we consider a cylindrical Wiener process W on L? (S, RI), the 
space of square integrable functions on S with values in R, 


W= yore, 


ieN 


where (e;)jcn is a Hilbertian orthonormal basis of L? (S, R?) and (Bien is 
a sequence of independent standard brownian motions on a stochastic basis 
(2, F, (Fr )reto,7], P) ({11]). The above does not converge in L?(S, R?) but in any 
larger Hilbert space U such that the embedding of L?(S, R“) into U is Hilbert- 
Schmidt, for instance U can be the L?(S) based Sobolev space H~%(S) for some 
a > d/2. 

The spatial structure of the noise is specified through a time dependent deter- 
ministic integral covariance operator o; defined from a bounded and symmetric 
kernel G: 


ofa) = f Fa, y, fO) dy, f € LXS, RÌ). 
For each (x, y, t), E(x, y, t) is ad x d symmetric tensor. Since ô is bounded 


in x; y and t, o (x, t) maps L? (S : Rİ) into itself and is Hilbert-Schmidt. Then, the 
noise can be written as the Wiener process: 
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oW; = > B, Ori, 
ieN 


where the series converges in L?(S,R“) almost surely and in L?(Q) for all 
p € N and Eq. (1) should be understood in the It6 sense. We may further write 
the dependance of the Wiener process in terms of the other variables: 


01 Wi(x, w) = $ Êi (o)orei (x), 


ieN 
We consider a divergence free noise: 
Vx -G(x, y,t) =0, x, ye S, t>0. 


Also, for each t € Rt, there exists (¢,(t)), a complete orthogonal system 
composed by eigenfunctions of the covariance operator at each time £ € R and 
another sequence of independent standard brownian motions, on the same stochastic 
basis (2, F, (F;)reo,7], P), such that we have the representation: 


oW; = D aA) BE. 
k=0 


This Gaussian random field is associated to the two-times, two-points covariance 
tensor given by 


Q(x, y, t, t’) =E (od W; (x) [oy dW,]"(y)) = [a Z, t) oy, Z, t')dy 5(t—t’) ’ 


with the diagonal part (i.e one time auto-correlation), referred to in the following as 
the variance tensor, and denoted by 


a(x,t) = [e. x98, y Ddy = So bila, Doa, 1). (2) 


k=0 


In a way similar to the classical derivation of Navier-Stokes equations, the LU 
setting is based on a stochastic representation of the Reynolds transport theorem 
(SRTT) [9], describing the rate of change of a random scalar q within a volume 
V(t) transported by the stochastic flow (1). For incompressible unresolved flows, 
(i.e. V -øo = 0), the SRTT reads 


a(f q, t) dx) i (Diq + qV + (u —us)dt) dx, (3a) 
V(t) V(t) 


1 
t4 =dq+(u-—us)+Vqdt+oadW, » Vq — ri -(aVq) dt, (3b) 
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where dq (x, t) = q(x, t + dt) — q(x, t) stands for the forward time-increment of q 
at a fixed point x, D; is introduced as the stochastic transport operator in [9, 12] and 
plays the role of the material derivative. Recall that u is the large-scale velocity used 
in (1) and a is defined in (2). Note also that we omit to mention the dependance of 
o on time. 

This operator is derived from the It6-Wentzell formula [8] to express the 
differentiation of a stochastic process transported by the flow [9]. The drift us = 
iv -a, coined as the Itô-Stokes drift (ISD) in [1], represents through the divergence 
of the variance tensor, the effects of the small-scale inhomogeneity on the large- 
scale flow component. This term can be understood as a generalization of the Stokes 
drift associated to the waves orbital motion. In addition to this modified advection, 
the stochastic transport operator involves an inhomogeneous diffusion driven by the 
variance tensor, which can be interpreted as a subgrid diffusion term attached to the 
mixing operated by the small scales. It can be noticed that this term would only 
be implicitly represented in Stratonovich integral form. However, the ISD would 
remain [1]. The remaining term corresponds to the advection by the random term. 
It can be observed by a direct application of Itô on the norm of the scalar that the 
positive energy brought by this (backscattering) term is exactly compensated by the 
energy loss by the diffusion [12]. Due to that, for a transported quantity, its energy 
is conserved pathwise, or in other words: for any realization of the flow. 

The above SRTT (3a) and Newton’s second principle (in a distributional sense) 
allow us to derive the following stochastic equations of motions (see Sect. 5 of [9] 
or Sect. 2.2—2.3 of [10]), which for any noise scaling € > 0 parameter and for all 
points of S reads, using o, us, a introduced above: 


1 
d;u + (u — eus) -Vudt + ceodW;, - Vu — 5 eV. (aVu)dt 


1 
= ——V(pdt + dp?) + R A(udt + £o dW;), (4) 
p e 
with the incompressibility conditions 
V. (u— eu )=0 , V-o=0, (5) 


and associated with Dirichlet boundary condition u(t, x) = 0 and G(x, y,t) = 0 
for all x € dS and t > 0. The initial condition is denoted by u(0, x) = uo(x) for 
all x € S. As usual, u(t, x) = (uj (t,x),...,ua(t,x)) and p(t, x) stands for the 
velocity and the pressure of the fluid, respectively. The term dp? corresponds to 
the Brownian (martingale) part of the pressure. The Ito-Stokes drift us is defined 


as Us [= mY -a and p stands for the fluid density. The dimensioning constant 


Re = UL/v denotes the Reynolds number, sets from the ratio of the product of 
characteristic length and velocity scales, UL, with the kinematics viscosity v. As 
for the noise scaling parameter, €, it encodes a scale of the unresolved energy and 
should converge to zero when all the flow components are resolved. Meaning thus 
there is no noise and the system corresponds trivially to the deterministic Navier- 
Stokes system. 
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Although the system corresponds to the Navier-Stokes for zero noise, the 
convergence toward weak (strong) solutions of the 3D (2D) deterministic Navier- 
Stokes, respectively, at the limit of vanishing noise needs to be assessed. This is the 
results we aim to prove in this paper. 

First of all, in order to work with a pressure-free system through a divergence- 
free Leray projection, we proceed to the change of variable v := u — €7us in (4) to 
rewrite the system with a classical incompressibility condition on v: 


1 2 
div + v-Vudt — -—Avdt + &*(v-V)us dt — Í V- (aVv)dt 
€ 
4 2 
E E 2 1 o 
— — V-(aVus) dt — — Aus dt + e*d;usdt = —— V (pdt + dp?) — 
2 Re p 


(ecdW, -V)v — (30dW; -V)us + = Ale dWw;), (©) 
e 


with the incompressibility conditions 
V-v=0 V-o=0, (7) 


for all points in S together with Dirichlet boundary conditions v(t,x) = 0, 
G(x, y,t) = 0 forall x € ƏS andt > 0 and the initial condition v(0, x) = vo(x) := 
uo(x) — €7us(0, x) for all x € S. In the following section we specify the spaces on 
which this system is defined, rewrite it in an equivalent abstract form and state our 
main result. 


3 Notations and Main Result 


Let V be the space of infinitely differentiable d-dimensional vector fields u on S, 
with compact support strictly contained in S, and satisfying V -u = 0. We denote 
by H the closure of V in L*(S, Rf) and V the closure of V in the Sobolev space 
H! (S, R¢). The space H is endowed with the L? (S, R?) inner product. This inner 
product and its induced norm are noted: 


(u, v)y = (u, Vv) 12(S) and |u| = lull 22s) 7 


As for space V, thanks to Poincaré inequality, it is endowed with the Hy (S, RI) 
inner product and its associated norm, denoted respectively as 


(u, v))y = (Vu, Vvs) and [luly := ||Vul 2s). 
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We may define then the Gelfand triple V C H C V’ where V’ is the dual space of V 
relative to H. We denote by (-, -) yxy the duality pairing between V’ and V. The 
space of Hilbert-Schmidt operators from H to H is denoted by £L2(#) and || - IIc, 
is its norm. 

System (4) may be rewritten in an equivalent simplified pressure-free formulation 
by using the Leray projection P : L?(S, RR?) —> H of L*(S, R@) onto the space H 
of divergence-free vectorial functions. Applying Leray’s projector to (6), we obtain 


1 
dv — p PeAvds) + P(v-Vudt) 
e 


2 4 2 
E € E 
+P (20V dt — V -(aVv)dt — —V-(aVus)dt — 2 Ausdt + Pat 
€ 


= (3 A(o dW,) — (eodW,-V)u — (30dW,; Yous). (8) 


e 


This system can finally be rewritten in the following simplified abstract form 
| div(t) + Av(t)dt + Bu(t)dt + Fv(t)dt = Gev(t) dW,, (9) 
v(0) = vo. 


The deterministic terms A, B, F, and the stochastic term G, are described below. 

Several kinds of solutions can be defined for stochastic partial differential 
equations. As for deterministic PDEs, these can be strong, weak or mild (semi- 
group) solutions. When the solutions are constructed for a fixed Wiener process W 
on a given stochastic basis (2, F, (F;)1e[o0,7], P), they are strong in the probabilistic 
sense. As usual in 3D, due to the lack of uniqueness, we work with weaker solutions, 
called martingale solutions, that consists in looking for solutions defined as a triplet 
composed of a stochastic basis, a Wiener process and an adapted process. 

More precisely, we say that there is a martingale solution of system (9) if there 
exists a stochastic basis (2, F, (Fi)reto, T], P), a cylindrical Wiener process W on 
L?(S; R®) and a progressively measurable process v : [0, T] x 2 —> H, with 


ve L? (2 x [0,7]; V) NL? (2, C°({0, T]; H)) f 


such that P — a.e, v satisfies for all time ¢ € [0, T] 


t t t t 
v(t) + f Av(s)ds + f Bv(s)ds + Í Fv(s)ds = vo + Í G(v(s)) dW;, 
0 0 0 0 
(10) 
where the equality must be understood in the weak sense. We will show, for all 


€ > 0, the existence in 3D of a martingale solution for the LU representation of 
the Navier-Stokes equations for noises associated with a smooth enough diffusion 
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tensor kernel G in space and time. In 2D, this solution is unique and strong in the 
probabilistic sense. This result is summarized in the following theorem. 


Theorem 1 Let d = 2 or3 and assume that the noise is smooth enough in the sense 
that its variance tensor and Ito-Stokes drift are such that 


sup olde (II, < (11) 


te[0,T] pap 


us € L©(0, T; H?(S, RIJ); drus € L® (0, T; H) and aVus € L® (0, T; V). 
(12) 
Then, for all e > 0, Eq. (10) admits a martingale solution. Moreover, for d = 2, any 
solution of (10) is strong in the probabilistic sense and unique. 

Morever, when ¢ — O, for d = 3, there exists a subsequence of (us)s>o which 
converges in law to a solution of the deterministic Navier-Stokes equation. For 
d = 2, the whole sequence converges to the unique solution of the Navier-Stokes 
equation. 


The condition of Theorem 1 simplifies when the covariance operator does not 
depend on time or if the ISD is divergence free. In both cases the condition on the 
temporal derivative of the ISD are not necessary. We note also, that for a spatially 
homogeneous noise, the variance tensor is constant and the ISD cancels. However 
this may happen only on a periodic domain or on the full space. The assumptions on 
the noise are anyway non optimal but it is not the purpose of this paper to consider 
non spatially smooth noise since in practice it is smooth. 

Note that condition (11) is satisfied for instance if we choose o independent on t 
and equal to A~’ with r large enough where A is the Stokes operator defined below. 
Indeed, in this case ġg = vad ex Where (ex)x is an orthonormal complete system of 


eigenvectors of A associated to the eigenvalues (Ax)x and ||@x Olas = i. 
3(S) 


The behavior of the eigenvalues: A, ~ k2/4 allows to conclude that (11) follows. 
Since us = 4V -a and a is defined by (2), (12) holds also for r large enough since 


lusla) < Dro NON, o Finally, since A7” is self-adjoint and Hilbert- 


Schmidt for r > d/4, it is associated to a symmetric kernel G which is bounded for 
r large enough. 

These convergence results open new interesting possibilities for the study 
of turbulence or for the proposition of new large-scale representations of fluid 
dynamics. From the theoretical point of view, it might be interesting to explore 
multiscale versions of the LU representation based on spatial filtering together with 
nested noise models. This would generalize classical large eddy models in which 
the noise would depend on the spatial filtering applied. The coarser the filtering 
the larger the noise. Energy transfer between scales would then be very interesting 
to study in this probabilistic setting. Stochastic Karman-Howarth-Monin equations 
for energy exchanges across scales could be obtained by this way. From a practical 
point of view, these convergence results justify the setting of such stochastic models 
to represent large-scale solutions of the Navier-Stokes equations. 
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4 Proofs of the Main Result 


We introduce the Stokes operator: Av := -+ P(Av) on the domain D(A) := 


V N H?(S, RI). Let b be the trilinear form and B the bilinear operator defined for 
all u, v and w € V by 


b(u, v, w) = J w(x) [u(x)+ V] v(x) dx = (B(u, v), w). 
S 


Recall that for all u, v and w € V: b(u,v,w) = —b(u, w, v). As usual, we set 
B(u) = B(u, u). We then define F by: 


2 4 
F(v) = &2B(v, us) — > PV -(aVv) — > PV -(aVus) — e2Au, (13) 
+ 67 OrUs, veVv. 


It can be seen that F(v) € V’. We next write the noise term as 


[0,0] 


GQ) dW, = J) ( -8 Ade — EBr, V) — BE. us) ) AB. 


k=0 


where, as for o, we omit to write dependance of ¢, on t. With these notations, (8) 
may indeed be rewritten as (9). 

Let (e;);>0 be the Hilbertian basis of H consisting of eigenvectors of A. We use 
the finite dimensional orthogonal projector P,, n € N, onto Span(eo, ... , én) and 
the projected operators: 


B” := P,B F" = PF G"=P,G. 
The Galerkin approximation of (9) is given by: 


| divn(t) + Avn (t)dt + B”[vn(H]dt + F"[up(t)]dt = G” ivn 0) dW, 
Un (0) = Pa (vo). 
(14) 
This is a finite dimensional system of a stochastic differential equation with smooth 
coefficients. It has a unique local solution, by the estimate (17) below it is global. 
Apply Itô formula to F(x) = |x|? for p > 2: 


dilon Ol2, = plon (DIP (vnlt), G"@n()aW), 
— plop (DIE? (un (0), Avy (0) + B" vn 0) + P"on()),, dt 
—2 
4 PPK (Gand, mO, nO dt + ENG on Olkan lOl ar. 
(15) 
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We have (un(t), Atn(t))H = > ln ONZ, Wn), B”vn(t))y = 0 and 


2 
(vn (t) , F” vn (O), =e (vnt) V]us, Un(t)) y — Z (unt), V-(aVun@))) , 


4 
- > (Un(t), V+ (aVus)) y +€7 (Aus, Un(t)), +£ Orus , Un(t)),, 


:= FP + FY + F3 + F} + Fé. 
Under the assumption (12) in Theorem 1, we have the estimate: 
|F} + F} + Fi + Fi] < Ce? + eh) lm, + Ce? +64) 

with C > 0 a finite constant. And by the definition of a, we have 

es 

E 2 
FE = Z De Vh Olisy 
k=0 

Furthermore, using (11), 


2 


1 oo 
ZIG Olean SZ LMG Veni, + Ce? + 267 lm Ol, 
k=0 


and the first term corresponds exactly to Fy’. Finally, using again (11), 
(G"un(t), u(t), < 2C (e? E) n OF. 


Hence 


d;|un(t)|? + nO nO < plun (DIT? (un@), G” (un (t))dW1) 


+ C(e? +e nO + Cle? eH eH (16) 


with C > 0 depending on p (and not on £ and n). We then use classical arguments 


based in particular on Burkholder-Davis-Gundy inequality to deduce: 


O<t< 


Arguing as in [6], we prove that the laws (L(vn))n are tight in L?({0, T]; H) and 


in C°([0, T]; D(A7?/?) ). 


23 


1 sf = . 
a | sup ln + f Jun (DI? on <E[|vol?]+Ce. (17) 
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_ By the Skorohod’s embedding theorem, there exists a stochastic basis (2, F, 
(Fd, P) with L?((0, T]; H) O C°({0, T]; D(A~7/7))-valued random variables vn 
for n > 1 and v such that v, has the same law as v, on L?([0, T]; H) A 


C?([0, T]; D(A~?/2)) and C°([0, T], Uo) cylindrical Wiener processes W” for 
n > 1 together with W such that (by thinning the sequences) 
Un > Din L?([0, T]; H) N C(O, T]; D(a?) Pas (18) 
W > W in C°([0, T], Uo) Pas. (19) 


For all integers n, v, verifies 

t t ü 

[Avn (r) + B” Un (r) + F'Un (r)] dr = [ow (r))dW... 
0 


(20) 
We may let n — oo in this equation and prove that v verifies for almost surely 
(t,@) € [0, T] x 2 


Un(t) — Pn(vo) +f 
0 


t 


t 
v(t) — vo + Í (Av(r) + Bu(r) + Fv(r)) dr = Í G(u(r)) dW, (21) 
0 0 


in the weak sense. For instance, let w be a smooth test function, then: 
t t t 
[ere wyuar = f K, Ort). war == f OO, w 0dr 
0 0 0 


and by the almost sure strong convergence in L?(0, T, H) this converges to 
— fo B(T), w, ((r))dr when n > ov. 

It can be shown that (17) holds for v, and letting n — oo we obtain a bound on 
v. In particular, 0 € L?(2 ; L?([0, T], V)) L2(Q; L®([0, T], H)). We then use 
the mild form of this equation to prove that v € C %([0, T], H) almost surely. 

For d = 2, we consider vı and v2 two solutions of (9) on the same probability 
space (2, F, (F;);, P) and, using Ito formula and classical estimates, prove that 


| sup e(r) |v -o | =0, 


O<r<T 


where e(t) := exp (a Jo lvo)? dr ) for a well chosen a. As JH llv2(r) II? dr < 


oo, we deduce P a.s, vj = v2 for all t € [0, T]. We have proved that pathwise 
uniqueness holds for d = 2. Then, using an argument due to Gyongy and Krylov 
(see for instance [5], Sect. 5), we conclude that the whole sequence (vn)n converges 
to a unique solution of (21). 

Let vo € H. For all € > 0, we have proved that the abstract problem (8) admits 
martingale solutions (ve)s>0. We then study if (vg)¢s9 converges when [e —> O*] 
to a solution v of the following deterministic Navier-Stokes equation 
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pee + Av(t)dt + Bv(t)dt = 0 (22) 


v(0) = vo. 


When d = 2, the solution vs is strong and unique. The deterministic Eq. (22) 
admits also a unique weak solution v. By classical estimate, we prove: 


a 2 
e | sup e(f)|ve(t) — vl; | > 9, 
0<t<T e—>0t 


where e(t) := exp (-« i lvo)? dr) for some a > 0. 


When d = 3, inequality (17) shows that (L0ve,)) are tight in L?((0,T]; H) A 
C 9([0, T]; D(A7?/ 2) ). Using Skorohod embedding theorem, we show that a 
subsequence converges to the law a weak solution of (22). 
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A Stochastic Benjamin-Bona-Mahony M) 
Type Equation ore | 


Evgueni Dinvay 


Abstract Considered herein is a particular nonlinear dispersive stochastic equation. 
It was introduced recently in Dinvay and Mémin (Proc. R. Soc. A. 478:20220050, 
2022), as a model describing surface water waves under location uncertainty. The 
corresponding noise term is introduced through a Hamiltonian formulation, which 
guarantees the energy conservation of the flow. Here the initial-value problem is 
studied. 
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1 Introduction 


Consideration is given to the following Stratonovich one-dimensional BBM-type 
equation 


du = —a:K (u + Ku?) dt + È yj0n (u+ Ku?) odW; (1) 
j 


introduced in [4], as a model describing surface waves of a fluid layer. It is 
supplemented with the initial condition u(0) = ug. Equation (1) has a Hamiltonian 
structure with the energy 


Hw) = f (30u) + 508) as. (2) 
R\2 3 
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The Fourier multiplier operator K, defined in the space of tempered distributions 
S’(R), has an even symbol of the form 


KE) > (1 +E)” (3) 


with og > 1/2. Expression (3) means that the symbol K (£) is bounded from below 
and above by RHS(3) multiplied by some positive constants. In other words the 
operator K essentially behaves as the Bessel potential of order 209, see [6]. The 
space variable is x € R and the time variable is £ > 0. The unknown u is a 
real valued function of these variables and of the probability variable œ € Q, 
representing the free surface elevation in the fluid layer. The scalar sequence {y;} 
satisfies the restriction }~ j y? < oo, and {W;} is a sequence of independent scalar 
Brownian motions on a filtered probability space (Q, F, {F;}, P). 

Model (1) was introduced in [4], where an attempt to extend an elegant 
Hamiltonian formulation of [1] to the stochastic setting was made. We will just 
briefly comment on the methodology of [4]. The white noise is firstly introduced 
via the stochastic transport theory presented in [8], which is based on splitting 
of fluid particle motion into smooth and random movements. Then it is restricted 
to a particular Stratonovich form in order to respect the energy conservation. In 
particular, it provides us with a model having multiplicative noise of Hamiltonian 
structure. Finally, a long wave approximation results in simplified models as (1), for 
example. 

One may notice that after discarding the nonlinear terms in Eq. (1), the details 
can be seen in [4], the corresponding linearised initial-value problem can be solved 
exactly with the help of the fundamental multiplier operator 


S(t, to) = exp | —3x K (t — to) + X yjx(Wj (0) — Wj (to) |; (4) 
j 


where tọ,t € R. Note that it can be factorised as S(t, to) = S(t — to)Sw (t, to), 
where S(t) = exp(—əxKt) is a unitary semi-group and Sw containing all the 
randomness coming from the Wiener process is unitary as well. They obviously 
commute as bounded differential operators. We recall that S(t) is defined via the 
Fourier transform §(S(t)w) = exp(—ié K (EHP E) for any y e S’(R) and 
Y = $w. Similarly, Sw (t, to) is defined by the line 


Swit, ov =F! | E+ exp [ie D> yj(WIO -Wa | VE) 
j 


It allows us to represent (1) in the Duhamel form 


t t 
u(t) = S(t, 0) | uo +f SO, s) f(u(s))ds + Erf S(O, s)g(u(s))dWj(s) |, 
j 
(5) 
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where 


fU) =a, Ku? +Y yd K Udy Ku’) 
j 


and 
gu) = 3x Ku’. 


Existence and uniqueness of solution to Eq. (5) is under consideration. It is worth 
to point out that both Sw and the stochastic integral in (5) are well defined. Indeed, 
appealing to Doobs’ inequalities for the submartingale Eo Vj w,| and the Itô- 
Nisio theorem one can show that $` jYjWj converges uniformly in time almost 
surely, in probability and in L? sense. If the integrand of the stochastic integral in 
(5) is in some Sobolev space H° (R) for each s and a.e. w, then we can understand 
this sum of integrals as an integration with respect to a Q-Wiener process associated 
with a Hilbert space H and a non-negative symmetric trace class operator Q having 
eigenvalues y? and eigenfunctions e; forming an orthonormal basis in H. Then the 
corresponding integrand is the unbounded linear operator between H and H7 (R) 
that maps all ej to the same element of H°” (R), namely, to a s)g(u(s)). In 
particular, it explains why we need the summability condition par y? <00. 

Before we formulate the main result it is left to introduce a notation as follows. 
By C(0, T; H? (R)) we will notate the space of continuous functions on [0, T] 
having values in H° (R) with the usual supremum norm. 


Theorem 1 Let og > 1/2 and o > max{o9, 1}. Then for any Fo-measurable uo € 
L? (RQ; H" (R)) N L® (Q; H(R)) with sufficiently small L H™-norm and any 
To > 0 Eq. (5) has a unique adapted solution u € L?(Q; C(O, To; H° (R))) N 
L® (Q; C(O, To; H% (R))). Moreover, H(u(t)) = H(uo) for eacht € [0, Ty] almost 
surely on Q. 


The conservation of energy (2) plays a crucial role in the proof. So it will be a bit 
more convenient to regard the energy norm defined by 


1 e 2 
lul? =; K-!/2u}dx 
= f a) 


instead of the spatial H°°-norm. They are obviously equivalent. 

The proof is essentially based on the contraction mapping principle. We do not 
exploit much smoothing properties of the group S(t, to), as for example is done 
in [2] for analysis of a stochastic nonlinear Schrödinger equation. It is enough to 
know that the absolute value of its symbol equals one, and that S(t) is a unitary 
semigroup. However, in order to appeal to the fixed point theorem we have to 
truncate both deterministic f and random g nonlinearities. There are a couple of 
technical difficulties related to implementation of the energy conservation in our 
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case. Firstly, for the truncated equation we can claim H-conservation only until a 
particular stopping time. Secondly, one can control ||u||z, with H(u) only provided 
lull} is small. These additional difficulties make us repeat the arguments of the 
last section in the paper iteratively in order to construct solution on the whole time 
interval [0, To]. 

As a final remark we point out that the noise in Eq. (1) can be gathered in one 
dimensional dy (u + Ku”) o dB with the scalar Brownian motion B = X, yj Wj. 
However, this does not affect the proof below anyhow, so we continue to stick to the 
original formulation (1). In future works we are planning to extend it to y; being 
either Fourier multipliers or space-dependent coefficients. 


2 Truncation 


The Sobolev space H7 (R) consists of tempered distributions u having the finite 
square norm ||u||770 = f IRE)? (1+ &7)° dE < 00. Let @ € C5°(R) with supp € 
[—2, 2] being such that 6 (x) = 1 for x € [—1, 1] and O < O(x) < 1 for x € R. For 
R > 0 we introduce the cut off Or (x) = 0 (x/R) and 


FRU) = ORM FM), gr) = rull yo 8) 


that we substitute in (5) instead of f(u), g(u), respectively. The new R- 
regularisation of (5) reads as 


t t 
u(t) = S(t, to) [w +f S(to, s) fr(u(s))ds + Erf S(to, s)gr(U(s))A W; (s) 
to j 0 


to 
(6) 
In this section without loss of generality we can set tọ = 0 and u (tọ) = ug. We will 
vary time moments fo below in the next section. Equation (6) can be solved with a 
help of the contraction mapping principle in L7(Q; C(0, T; H° (R))). 


Proposition 1 Leto > 1/2, uo € L*(Q; H° (R)) be Fo-measurable and To > 0. 
Then (6) has a unique adapted solution u € L?(Q; C(0, To; H? (R))). Moreover, it 
depends continuously on the initial data uo. 


Proof We set Tu(t) = RHS(6). We will show that 7 is a contraction mapping in 
Xr = L?(Q; C(0, T; H” (R))), provided T > 0 is sufficiently small, depending 
only on R. Let u1, u2 be two adapted processes in Xr. Firstly, one can notice that 


I fr(u1) — fr(U2)llyo < C (1+ R} llui — wall ge , 


llar (u1) — gr(U2)||yo S CR ||u; — uzl go - 
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Indeed, H7 (IR) poses an algebraic property foro > 1/2 and 0,K is bounded in 
H” (R). Then assuming ||u1l|| go > ||u2|| go without loss of generality one deduces 


lsr) — grd) < C (Ordu ludu? — Orula], 


2 2 
< COe (lille) |u -u 


H” 


+ lr (eile) = Pruan feal 


<S CR |lu1 — u2ll yo , 
where we have used the estimate |@g(||uill qo) — Or (lluallze)| < 0| zo R7! 


ui — uz2ļ|| g0 following obviously from the mean value theorem. The difference 
between fr(u;) and fr(u2) can be obtained in the same way. Thus 


t 
|Tui@) — Tu2x@Ilno S | S(O, 8)(frui(s)) — fr(ur(s)))ds 


He 


t 
+) [ SO, s)(ga(ui(s)) — gr(ua(s)))dWj(s)f| = 1+1 
J He 


The first integral is estimated straightforwardly as 


T 
I< [ ll fr(ui(s)) — fr(ur(s))Ilyo ds < CA + RYT llui — u2llc0,T: H°) - 


The second one is estimated with the use of the Burkholder inequality [5] as 


T 
i sup mof lgr (8) — gr@r(s)) Myo ds < CR?TE |lu — walleo,r-n2) - 
O<t<T 0 


It is clear that time-continuity of 7 u1, 7 u2 follows from the factorisation S = SSw 
and the estimate ||Swgr(u)|| qo < C R?, so we have a stochastic convolution as in 
[5, Lemma 3.3]. Thus 


Tut — Tually, < € (0 + RPT + RVT) lui — wally, » 


and so there exists a small T depending only on R such that 7 has a unique 
fixed point in Xr. Moreover, this estimate also gives us continuous dependence 
of solution in X7 on the initial data uo € L? (Q; H” (R)), obviously. Clearly, the 
solution can be extended to the whole interval [0, To]. o 
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The regularisation affects the energy conservation. Indeed, in the Itô differential 
form Eq. (6) reads 


1 
du = | -aKu +5 2 yj acu + fru) + = V7 9xgR(u) | dt (7) 
J J 


+) yj Oxu + gr) dW;, 
J 


and so applying the Itô formula to the energy functional H (u(t)) defined by (2) with 
the use of (7), one can easily obtain 


dH(u) = (e =1) f u’ðxKudx + Or (Or — 1) >D 7 f (380K + ug?) dx | dt. 
j 
(8) 


Indeed, assuming o > oo + 2 at first, we notice that the solution u given by 
Proposition 1 solves Eq. (7). Let us introduce the following notations 


VAdt + ®()dW = W(tdt +Y yj tejd Wj = RHS(7). 
j 


Then Itô’s formula reads 


t 


t 
H(u(t)) = H (uo) +f duH(u(s)) WV (s)ds +f OuH (u(s)) ®(s)dW(s) 
0 0 


t 
į J tr 92H(u(s))(®(s), ®(s))ds, 
0 


where the Fréchet derivatives are defined by 


BHG = [ (K PuK "g +076) är 


BHW, y= | (KAK y + upy) dx 
R 


at every ġo, Y € H” (R). Substituting these expressions together with the definitions 
of ® and W into the It6’s formula one obtains (8). Let us, for example, calculate the 
stochastic integral 


t t 
Í dyH(u(s)) ®(s)dW(s) = Dr [ (Puke) 


(acu + On (lull ye )dx Ku?) dxdWj 
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that equals zero as one can see integrating by parts in the space integral. Similarly, 
one calculates the other two integrals in the It6 formula. Thus we have proved (8) 
for o > oo + 2. In order to lower the bound for o, one would like to argue here by 
approximation of initial value wo via smooth functions and appeal to the continuous 
dependence on uo, however, there is a problem here, since Opg in (8) contains the 
dependence on o. So even for a smooth initial data the corresponding solution lies a 
priori only in H7 . This difficulty is overcome in the next statement, where we argue 
similar to [3]. 


Proposition 2 Let oo > 1/2 and o > max{o9, 1}. Then (8) holds almost surely for 
u satisfying Eq. (6) given by Proposition 1. 


Proof The main idea is to cut off high frequencies of the differential operator 0, 
in (7) as follows. Let P, be a Fourier multiplier with the symbol 0}, A > O. It 
is defined by the expression §(P,w) = 6 Y . Now we consider instead of (7) the 
following regularisation 


1 
du = | -3 Ku +5 2 y3 Pu + fr(u) + 2 y7 Ox Pagr(u) | dt (9) 
J J 


+ yj Or Piu + gr(u)) dW; 
j 


that has a strong solution. Indeed, it contains only bounded operators and the 
corresponding mild equation has exactly the same form as Eq. (6) with S* = S Sh 
now instead of S, where 


Sh = exp 5 Vj 9x Py (Wj (t) — W;(to)) 
J 


So we can actually apply Proposition 1 to obtain u = u, solving (9). Let u = Ugo 
stay for the solution of the original Eq. (6). Firstly, we will check that u) — uoo in 
L?(Q; L?(0, To; H? (R))) for any o > 1/2 as A > oo. 

LetO < t < T < To, where a positive small enough time moment T is to be 
chosen below. Then 


lurt) — uo Dlg = || T*ua (0) — TP oo (t) |) yo 


< | (S*@, 0) — S”, 0) wo || yo 


t 
+ | (S>, s) — S% (t, s)) fr(uoo(s))ds 
0 


He 


t 
+ [y S*(t,8)(frWa(s)) — fr Uoo(8)))ds 


H” 
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t 
+ (Sc, 0) — S% (t, 0)) > Yj f S% (0, 8) 8R(Uoo(s))d Wj (s) 
j 


He 


t 
+y [ (S*(0, s) — S®O, s)) gx (Woo(s))d W;(s) 
J 


He 


t 
+107 f S% (0, s)(gr (uals) — 8x Uoo(s)))d W;(s) 
j 


= h +...+ I6- 


The terms J; and I6 are estimated exactly as the analogous integrals J and JJ in the 
proof of Proposition 1, namely, 


B < CU+ RYVT [lua = uoll120,T:H0) 


and 


y 
z sup I< CE f EA T OE I ET 
0<t<T 0 


2m 2 
<S CRE lua = uooll720,7: H0) . 


Thus 


T 
f (3 +18) dt < € (0 + RYT? + RT) E llu — uol? 20,7; g0) 
0 Ey 


and so there exists a small T > 0 depending only on R such that 


T 
E E SC f (2+ +348) at. 
ew 0 


One needs to show that the right hand side of this expression tends to zero when 
à — oo. All these four integrals are treated similarly. Indeed, let us regard more 
closely the first one 


tef 


that obviously tends to zero as à —> œœ for a.e. w and any t. Hence E H I ?dt >0 


2 


DOP (1+) ag 


exp [se yoviWi o) — exp (i yoviW; o) 
J J 


by the dominated convergence theorem, sine Jj < 2 ||uol| 7° . The integral of 1 
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is estimated exactly in the same manner with the stochastic integral of S° gr (uoo) 
standing in place of uo. The second integral 


T T eT 2 
sf dt <T sf / (St, s) — S® t, 8) fr(co(s))|| yo dsdt > 0 
0 0 0 


by the dominated convergence theorem, since ||. . iae < CR*(1+R)*. Finally, the 
last integral 


T T 
Ef Idt <TE sup I <CT f |(S* 0, s) — S” (0, s)) ge (Uoo(s)) | 70 ds > 0 
0 te[0,T] 0 


by the Burkholder inequality and the dominated convergence theorem, since 
lec < COR* 

Repeating this argument iteratively on subintervals of [0, To] of the size T one 
obtains that u) —> uo in L?(Q x [0, To]; H° (R)). 

Let us calculate each term in the It6 formula for u = u}. As we shall see the 
corresponding stochastic integral is not zero, and moreover, it is difficult to pass to 
the limit A — oo treating the stochastic part. So instead of H we consider at first a 
sequence Hn, n € N, with the cubic term being cut off in the following way 


1 
DA 2 3 
Hn(u) = |lullay + 3 on (14) fm dx 


that clearly tends to H (u) almost surely at any fixed time moment. The correspond- 
ing Fréchet derivatives are defined by 


uH G = [ Ie $ o, (iui) f way) Koy K-24 + 6, (u13) "%4 dx 


amwen f [0+ a (ints) fas) KU Ky +26, (ul,) wor | dx 
R 
+6) (iny) f oax f Kuk" yay 


F 50h ( lee) )/ wax f Kuk ody 


/ Koy KO "pdz 


at every p, Y € H™(R). Substituting it to the stochastic integral one obtains the 
following expression that can be simplified by integration by parts 
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t 
f Iu Hy (u(s)) ®(s)d W(s) 


sru f (+ z0, ( (113) f way) K! uK" +0, (1u13,) J 


$ 
(a: P.+On (lull yo) dxK uw?) asawj= 5v; | on (lulĝi) f 12: Paudxa 
- 0 R 
j 
where u = uy. We will show that this integral tends to zero as à —> œœ. 
That is exactly the place where we need the cut off 6,. Applying some algebraic 


manipulations to the space integral and the Burkholder inequality to the stochastic 
integral, one deduces the estimate 


2 


č sup 
0<t<T 


To 2 
<CE f 62 (haO) ( i ŽOP ~ Dua()dx) dt 
0 R 


To 
< ce f e (tuoi) aO 
0 


(CP, = Duo Oli + CPA = DOGO = woop) at 


t 
f IHn (u(s)) D AW) 
0 


4 to ne = 
<CntB | (INP = Duco (Oyi + MO.) = otl) dt > 0 


as à — 0 for each fixed n € N. Note that the use of the functional Hn instead of H 
is important here. Similarly, we calculate the rest two terms in the Itô formula 


1 
Bun (uy® + 5 tr d7H(u)(®, P) 


= (Or — On) f u> dx Kudx + 0nOR (Or — 1) > y? / ug’ (u)dx 
j 
Or(Or — 1 
+ “aa D7 f ewK- lends 
J 


0 
+5 D 7 f (waz peu + 2u(B, Pau?) dx 
J 


+OnOr >); (2 / u (dx Pru)g (u)dx — / g(w)P,K"¢d) 
j 
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1 Or —1 
+ 50nd f way 7 S77 f ewK-eands — f ugwnas 
j 


= Ji +... + Je, 


where as above u = u. One can prove that for a.e. w € Q and t € [0, To] the first 
three terms Jı + J2 + J3 tend to the integrand of the right hand side of Expression 
(8) in the subsequent limits, firstly, as à —> oo and then as n —> oo. Both J4 and 
J5 tend to zero as à — oo. Meanwhile the last term Je stays bounded by C/n, and 
so limy-+oo lima+oo Je = 0. Let us show, for example, that J4 — O which is the 
most troublesome term in the sum, since here is the only place in the paper where 
we make use of the fact o > 1. The rest are treated similarly without this additional 
restriction. Indeed, 


J4 < C f (ud, Pu — P, (uðyu)) (P) — 1)ðyudx 


< C luai (Pa — Detooll zt + lua — tooll) 


that obviously tends to zero as à —> oo. This concludes the proof. o 


At this stage one cannot claim the energy conservation yet, so we will prove a 
weaker result that will be sharpened later. Note that there exists Cz; > 0 such that 


llull5, A — Cry lula) < Hu) < Nulli A + Cx luly), (10) 


following from the well-known embedding H® (R) —> L™(R), recall that oo > 
1/2. 


Lemma 1 There exists a constant Tı > 0 independent of w such that if u solving 
Eq. (6) has |lullay < art on some interval [0, t] then H(u) < ZH (u(0)) on [0, Ti A 
T]. 


Proof At first one can notice that as long as ||u||z, stays bounded by (2C3,)~!, we 
have 

1 2 3 2 

5 lulh < HOD < 5 Mele- 
Moreover, one can as well easily deduce from (8) the following bound 


t 
H(u(t)) < H(u(O)) + cf H(u(s))ds, 
0 


and so the proof is concluded by Grönwall ’s lemma. o 
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3 Proof of the Main Result 


We construct a solution u of (5) iteratively on the intervals [0, 71], [T1, 271] and so 
on. Here the interval size T; is defined by Lemma 1. Staying under the assumptions 
of Theorem 1, we denote by um solutions of Eq.(6) with R = m e N given 
by Proposition 1, where we subsequently set tọ = 0, 7), 27),.... We define the 
stopping times 


Tm = TË = inf {t € [to, To] : lum (tll ve > m} (11) 
with the agreement inf Ø = Tọ. Starting with tọ = O we firstly show the following 


result. 


Lemma2 For a.e. w E€ Q, anym E N and eacht € [0,t] with t(w) = 
min{Tm (w), Tr+1(@)}, it holds true that um (t) = Un+1(t)- 


Proof We define 


uj(t) = 


, i=m,m+l1. 


uj(t) if t € [0, tT] 
S(t, t)uj(t) ift €[t, To] 


At first we will show that ùm and %,+41 coincide in Xr provided T is sufficiently 
small. Then we will finish the proof by an iteration procedure. The difference of 
these functions has the form 


ENE: 
Um+1(t) — Um(t) = S(t, Df SO, 5) (fWm+1(s)) — f Um(s))) ds 


tAT 
+S. > y; j! SCO, 8) (8m1 (8)) — ¢@im(s))) dW j(s), 
J 


where the stochastic integral is estimated via 


2 
t 
E sup ||Sw(t, 0) > vif S(t — 8) xX{s<xy(s) Sw, 5) (8 m41 (8)) — 8m (s))) dWj(s) 
0<t<T 7 0 
T 
< ce f Xts<z}(s) II Sw (0, 5) (¢@m+i(s)) — ¢@im(s)) Fo ds 
T 
<C [ Xs<r}(s) (limi C9) + lim (s)llae ¥ ln (s) — tm Olly ds 


< C(2m + 1) TE sup |lăm+1 — Hm lljo 
[0,T] 
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with the help of the Burkholder inequality for convolution with the unitary group S, 
see [5, Lemma 3.3]. The first integral is estimated more straightforwardly, notice a 
similar argument employed to J in the proof of Proposition 1, and so one obtains 


lumi — Um lx, < C(m)VT lami — üm ll xy . 


Hence Am+1 = Um on [0, T] for a.e. w € Q provided T is chosen sufficiently small 
depending only on m. Thus we can iterate this procedure to show that m41 = Um 
on the whole interval [0, To], which concludes the proof of the lemma. oO 


Our goal is to bound ||um |l;2c@,7,;47) by a constant independent of m € N, and 
so we will need to estimate || f(um)|l yo. ||g(um) || qo , in particular. This can be 
easily done with the help of 


lov Ile < Ce, 90) (Ill ze WW llaeo + Ill zo IY lle) 


being true for any o > 0 and op > 1/2, see for example [7, Estimate (3.12)]. 
For a.e. w € Q and any m € N, t € [0, To] we have 


t 
lum Olly < luotu + f If mls) yo ds+ 


> Yj f S (0, S)&m (Um (s))dW; (s) 
J 


H” 


where ||f(um(sDllgo < C (llum (5) ll 720 F llum(s)II4,00) lUm(s) |e . Now tak- 
ing into account that ||S(O,5)gm(m(s))Ilye < C |lum(s)|lx20 llum(s) lla , the 
stochastic integral can be estimated by the Burkholder inequality, and so we obtain 
for any 0 < T < Tp the following inequality 


T 

E sup lim (DN < 3E uolo + CE f (ton ©oa + Nim (DIS) lem (ODM je dt, 
tefo 

(12) 

where C depends only on 09, o, To, X., j y?. This inequality we will use iteratively 


on the intervals [0, To A kT], k € N, with Tı found in Lemma 1. Let ||wollq, < 
(5Cz,)~! a.e. on Q. Consider the following stopping time 


= inf {1 € [0, To] : lum Olly > 2cy)"| . 


Then a.e. Tı < Ty”. Indeed, assuming the contrary T; > T” one can deduce from 
(10) and Lemma 1 that 


[iim T |y < V2HUm (T3")) < 2V Huo) < 21 + Cy luoliy luola 


Pot a CI! < (2C)! 
N T25 H I H ; 
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which contradicts to the definition of the stopping time T,” due to continuity of 
lum lay. AS a result ||uml|}4 stays bounded by (2Cx)7! on the interval [0, Tı] for 
a.e. w, and this simplifies (12) in the following way 


T 
uolĝe +€ f sup liatia 
0 


se[0,t] 


7 2 7 
sup llm Oll He < 3E 
te[0,T] 


holding true for any 0 < T < Tı. Hence by Grönwall ’s lemma we obtain 
2 2 T 
lumllz2c0,7,: H0) < 3 luolly2 yo ef = M, 


where M does not depend on m € N. Hence 


1_ 2 M 
Phim 2 Ti) = P (lumllCO,T:H0) <S m) 21- woe lumll COT: H) 21- nE 


and so [0, T1] C Unen[0, Tm (@)] for a.e. w € Q. Thus we can define u on [0, Tı] by 
assigning u = um On [0, tm]. This is obviously a solution of (5) on [0, Tı] satisfying 
dH(u) = 0 and |u| < (2C31)~! for a.e. w € Q. 

Now one can repeat the argument on [T;, 27\] by constructing new solutions 
Um Of Eq. (6) with the initial data u(T1) given at the time moment tọ = Tı. The 
stopping times Tm are defined by (11) with t9 = Tı. The fact that ||um||z, does 
not exceed the level (2C3,)~', is guaranteed by the energy conservation, namely by 
H(u(Tı)) = H(uo) in the same manner as above. The rest is similar, and so we 
get a solution on [T;, 27] with the constant energy equalled H(uo). After several 
repetitions of the argument we construct a solution on [0, To]. 

It remains to prove the uniqueness. Let u1, u2 € L?(Q: C(O, To; H” (R))) solve 
Eq. (5). For R > 0 we introduce 


TR = ntf; € [0, To] : max luill go > r) : 
I=1, 


Clearly, for a.e. œ € Q both uw; and u2 are solutions of (6) on [0, tr]. By 
Proposition 1 it holds true that uw; = u2 on [0, tr] for a.e. w € Q. Taking R € N and 
exploiting the time-continuity of u1, u2 one obtains u; = u2 on [0, limr-_+oo Tr] for 
a.e. w E€ Q. Now from sub-additivity and Chebyshev’s inequality we deduce 


P(tr > To) =P (max luillco, T: H7) S R) 
i=l, 


1 7 2 2 
21- RZ (Iker I20,29,28 F Wwealkeo,r;4) > 1 


as R — ov, proving uı = u2 on [0, To]. This concludes the proof of Theorem 1. 
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Observation-Based Noise Calibration: A 
An Efficient Dynamics for the Ensemble cree | 
Kalman Filter 


Benjamin Dufée, Etienne Mémin, and Dan Crisan 


Abstract We investigate the calibration of the stochastic noise in order to guide the 
realizations towards the observational data used for the assimilation. This is done 
in the context of the stochastic parametrization under Location Uncertainty (LU) 
and data assimilation. The new methodology is rigorously justified by the use of the 
Girsanov theorem, and yields significant improvements in the experiments carried 
out on the Surface Quasi Geostrophic (SQG) model, when applied to Ensemble 
Kalman filters. The particular test case studied here shows improvements of the 
peak MSE from 85% to 93%. 


Keywords Stochastic parametrization - Modeling under location uncertainty - 
noise calibration - Ensemble Kalman filters - Square root filters 


1 Introduction 


Sequential data assimilation uses observational data to correct a set of realizations 
given by a numerical model. In the case of both high-dimensional data and model, 
the data assimilation methodology can be facilitated via a procedure allowing to 
guide the realizations towards the available observations. This is particularly helpful 
in high dimensions as it enables the ensemble to focus on a restricted set of 
the state space. That is what we intend to put forward in this paper. This work 
relies on a stochastic parametrization of the underlying dynamical system based 
on the Location Uncertainty (LU) principles, which rely on a decomposition of 
the Lagrangian velocity into a large-scale smooth component and a random time- 
uncorrelated component. In this setting, a stochastic transport operator plays the 
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role of the usual material derivative, see [1] for more details. This work aims at 
adding the feature of a noise specifically calibrated to play a guiding role for the 
realizations. 

In a previous data assimilation study on the Surface Quasi Geostrophic (SQG) 
model, the stochastic forecast was shown to provide better results than deterministic 
techniques like variance inflation with perturbation on the initial condition, see [2] 
for details. The current study is a continuation of [2]. The noise calibration presented 
here further improves the results presented in [2], particularly when the system starts 
from poor or badly estimated initial conditions (for instance resulting from initial 
estimations relying on regularized inverse problems). For such initial conditions, 
which are generally too smooth and inaccurate, classical ensemble methods are 
likely to be put in difficulties. In this short paper, we will first briefly recall the 
principles of Location Uncertainty and how it applies to the SQG model. Then we 
will detail the procedure leading to the noise calibration, and finally detail and assess 
the numerical experiments performed. 


2 The Stochastic SQG Model Under Location Uncertainty 
(LU) 


The analysis in this paper is carried out on the 2D Surface Quasi-Geostrophic 
(SQG) model. The SQG equations model an idealized dynamics for surface oceanic 
currents. It involves many realistic non-linear features such as fronts or strong 
multiscale eddies (see [3, 4] for details). The deterministic SQG model couples 
a transport equation of the buoyancy field b, a kinematic condition and a 2D 
divergence-free constraint: 


misi: p= He Aty ; sav (1) 
0 


expressed on w the stream function and v the velocity, where D, is the material 
derivative. The kinematic condition depends on the stratification Nsṣtrat and the 
Coriolis frequency fo. 

The corresponding stochastic dynamics is derived from the Location Uncertainty 
(LU) principles described in [1]. The full description and numerical analysis of the 
LU-SQG model can be found in [5, 6]. This stochastic formalism models the impact 
of the small scales on the flow component that is initially smooth in time. It relies 
on the decomposition of the Lagrangian velocity of a fluid particle positioned at x, 
in a spatial domain 2 C R?: 


dx; = v(x, t)dt + o (xr, t)dB;, (2) 


in terms of a resolved component v (referred to as the large-scale component in 
the following) and o dB;, an unresolved highly oscillating random component, built 
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from a (cylindrical) Wiener process B, (ie a well-defined Brownian motion taking 
values in a functional space) [7]. The increments of the latter component are time- 
independent. Due to the lack of smoothness of the solution x;, we rigorously derive 
(2) in its integral form. 

The random perturbation of velocity is Gaussian and has the following distribu- 
tion: 


odB, ~ N(0, Qdt), (3) 


where Q is the covariance operator. This operator admits an orthonormal eigenfunc- 
tion basis {@n(-, t)}nen with non-negative eigenvalues (Ay, (t))nen. This generates a 
convenient spectral definition of the noise as 


o (x, DAB, = X VAn (bn (x, DAL, (4) 


neN 


where the 6” are i.i.d standard one dimensional Brownian motions. From Eq. (4), 
the noise variance tensor a is then defined by 


a(x,t) = Do n Opr E, Donx, 1)”. (5) 


neN 


It can be noticed the variance tensor has the physical dimension of a viscosity 
(ie m?/s). Indeed, as odB, is a distance, then a(x, t)dt = E[odB,(odB;)"] is a 
squared distance. The procedure used to generate the orthonormal basis functions 
determines the spatial structure of the noise. The one used in our experiments will 
be presented later in this section. 

While a deterministically transported tracer © has zero material derivative: 
D,O = %0 +v. VO = Q0, in the LU framework, a stochastically transported 
tracer cancels a related stochastic transport operator defined as: 


1 
D,© := 4,0 + (v*dt + odB,)- VO — av -(aV@)dt, (6) 
where 
d,O := O(x,t+ dt) — O(4, t) (7) 


is the infinitesimal forward time increment of the tracer. The effective advection 
velocity is defined by 


ro 1 
v ae VG, (8) 
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the term odB,; - VO is a non-Gaussian multiplicative noise corresponding to the 
tracer’s transport by the small-scale flow, and the last term in (6) is a diffusion 
term, as the variance tensor a is definite positive. The expression of the stochastic 
transport operator comes from a generalized It6 formula (It6-Wentzell formula), see 
[5] for more details. 

The stochastic version of the SQG model is obtained by replacing the material 
derivative D;b in Eq. (1) with the stochastic transport operator D;b: 


1 
Dib = drb + (v*dt + odBy) - Vb — 5V » (aVb)dt = 0, (9) 


and an additional compressibility constraint on the noise: 
V -odB, = 0. (10) 


In the case of a compressible random field, the modified advection incorporates 
an additional term in Eq. (8) related to the noise divergence [5]. One essential 
property of LU (for a divergence-free noise component) is the conservation of 
energy for the transported random tracer, under the same ideal boundary conditions 
as in the deterministic case: 


af O? (x)dx = 0, (11) 
RQ 


and, very importantly, this energy conservation property holds pathwise (i.e for any 
realization of the Brownian noise), see [5, 8] for details. This property highlights 
the strong relation between the LU-SQG version and the deterministic one. 


Noise Generation The method used to generate the noise in this study relies on a 
data-driven method called proper orthogonal decomposition (POD) to estimate the 
empirical orthogonal functions in the spectral representation of Eq. (4). By a slight 
abuse of notation in the following, this noise will be referred to as POD noise. We 
give some brief details in what follows. 


Considering a series of snapshots of the velocity field, this method consists in the 
computation of the covariance tensor around the temporal mean of the series of 
snapshots. Then its eigenvectors and eigenfunctions can be estimated in order to 
reconstruct the large-scale variability (the first“modes” or eigenfunctions), and the 
small-scale one (the smaller modes). In practice, this procedure is applied to coarse- 
grained high-resolution snapshots of deterministic simulations. The latter modes 
will be the ones on which the noise is decomposed. These modes are divergence- 
free and stationary by construction, so the global structure of the noise will not vary 
in time. In case of chaotic geophysical models like this one, we can also use online- 
computed noises as the one used in our previous work [2] which have much better 
uncertainty quantification, but are also much more expensive. An extension of this 
work to this noise is currently at work. We refer to [6] for a precise description of 
this procedure. 
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3  Girsanov Theorem and Noise Calibration 


3.1 Change of Measure 


Ensemble-based sequential data assimilation filters are composed of a forecasting 
step of the ensemble to provide a sampling of the forecast distribution, and an 
analysis step correcting the departure from the observations. The purpose of the 
proposed noise calibration is to modify the forecast distribution, taking into account 
the upcoming observation, in order to guide the forecast towards it. In the context of 
transport equations such as in the SQG model, this extra guiding term is an added 
drift in the noise od B;, which was initially built to have zero mean. Allowing od B; 
to have a non-zero mean entails a modification of the transport equation in order to 
rewrite it in terms of a centered noise. This is called the Girsanov transform, and it 
consists in a change of underlying measure so that a non-centered noise becomes 
centered under a new probability measure, up to a drift term accounting for this 
change of measure. For now, o dB; is defined on a probability space (Q, F, P) and 
we define (F;); the filtration adapted to o dB;. 

The Girsanov theorem (see [7] for details) states that if (Y;)o<;<7 is a stochastic 
process such that: 


— (%)o<r<r is adapted with respect to the Wiener filtration (F;)o<:<r. 
— For the current probability measure P, we have, P-almost surely, 


T 
Í Y?dt < 00. 
0 


— The process (Z;)o<;<7 defined by 
t 1 t 
Z, = exp (/ Y,dB, — F Y?ds (12) 
0 2Jo ` 


then there exists a probability measure P under which: 


is a F;-martingale, 


— The process (Byo<t<T defined by 


t 
By = B,— f Yods (13) 
0 
is a standard cylindrical Wiener process. 
— The Radon-Nikodym derivative of P with respect to Pis Zr. 


Let us denote by (J})o<:<r the drift we intend to add to the noise. With such a 
change of measure, let us see how Eq. (9) is modified. According to Eq. (13), we 
have 


dB, = dB, + dt, (14) 
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so the stochastic transport operator rewrites 


1 
D,b = dib + (v*dt + o[dB, + T,dt]) - Vb — mh - (aVb)dt (15a) 
~ 1 
= d;b + (v*dt + vrdt + odB,) - Vb — y -(aVb)dt, (15b) 
where 
K 
wr =) or (16) 
k=1 


is the velocity drift entailed by the Girsanov transform and we assume that I; = 
I’ = (y1, ..., YK) is constant on a small time step dt, which will be the case for the 
discretized numerical scheme that we use. 

As a result, under the probability measure P, (15) presents the same form as Eq. (9) 
since B is indeed a centered cylindrical Wiener process under P, but with an added 
drifted advection velocity. 


3.2 Computation of the Girsanov Drift 


We now describe how to compute I” in order to guide the forecast towards the next 
observation. 

Let us start from a given time ty where a complete buoyancy and velocity field is 
available. The next observation bs (-, t2) is assumed to be available at time t2 and 
L numerical time steps are performed until then (t2 — tı = L6;, where ô; is the time 
discretization step). 

At time t1, a rough prediction of the velocity at time t2 can be estimated with the 
current velocity (which, more precisely, comes from previous stochastic iterations, 
but is F;,-measurable), namely 


b°?S (x + u(x, t1)L6y, t2) := b(x, t2), (17) 


that stands for the backward-registered observation with respect to the current 
deterministic velocity. This way the error made is 


A,b(x) = bœ, t) — b(x, th). (18) 


So bx, ty) is a value taken in a modified observation field, because b°’ is advected 
by the current velocity v(-, tı). For this reason we consider that the backward- 
registered observation used for the calibration does not have the same nature as 
the raw observation used for data assimilation. It constitutes a pseudo-observation, 
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for which we can consider that the error due to the imprecision of the backward- 
registration (ensuing in particular from successive bilinear interpolations) is way 
bigger than the observation noise, and almost uncorrelated to the latter. In the second 
case, only the raw observation is used for the Kalman filter, corresponding only to 
the observation noise. The aim is now to calibrate the current velocity by adding a 
Girsanov drift vr = SL: 1 Yebk, Such that the solution of the following transport 
equation 


K 
b ( + (x,t) L6; + vr Lb, +Y (Verp Lê pr), a) =D(x,t). (19) 


k=1 


is approximated in a least square sense. In other words, we solve the following 
minimization problem: 


min f efo(x + v(x, 11) L6; + vr Ld; 
r JQ 
K 2 
+Y (Vèrp) (V Lê, Br). n) — d(x, m| dx. (20) 


k=1 


This can be rewritten as 
; ~ ~ 1_- 1 z 2 
mın Arb +Vb. vrLôr — -Vb. VaLé; —-V. (aVb)L6; dx. 
F R 2 2 


Using the identities 


K K 
V-a=Ņ (Qk: V)k ; V- (aVb) = X (Øk + V)(Øk + VD), (21) 


k=1 k=1 
we rewrite the minimization problem as 
K K 2 
min f Arb + VD -| X` vebe | Lê — Iyi - Fy + GKD) Lô, | dx 
r Jo k=1 2 


(22) 
where 


Fy = (be VAr 3 GKO) = (be + V) (be + VB). 


Denoting by J the integrand, we have 
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K 
$2 a2 f viota | abt vi- (X no) a 
k=1 


K 
a a; Vb Fy + cube dx. (23) 


a 


Finally, we add a regularization term a||v ril =a aa 1 VE; where A, is the 
eigenvalue of the Q-eigenfunction ¢, in Eq. (22) to ensure the uniqueness of the 
solution of the proposed minimization problem, where œ needs to be tuned properly. 
As a result, the minimization problem can be written as an inverse problem 


Ar =c 24) 
where 
Apa j) (VB epii de) + 20028 (25a) 
Q 
K 
Gis i (vb. Qi) a he Xoi. Fg + c| dx. (25b) 
k=1 


The parameter « is a priori fixed in order to control the resulting euclidian norm of 
ur, ||ur||2. Large values of æ lead to very small corrections (I tends to (0, ..., 0) 
when @ goes to +00) whereas small values yield very strong and noisy drifts, as 
we get closer to an ill-posed problem. For now, we use an empirical iterative way to 
tune œ, we increase it until the resulting norm of vr is under a given threshold. 


4 Experiments 


This section details the numerical experiments carried out in this work. The goal is 
to study the benefits brought by a noise-calibrated forecast in an up-to-date version 
of a localized ensemble Kalman filter. In particular we wish to observe whether or 
not the noise calibration brings by itself an efficient and practical improvement of 
the assimilation step. 

Ensemble Kalman filters (see e.g. [9] for details) constitute a well-known family 
of data assimilation methods. They rely on an ensemble of realizations (called 
ensemble members) of a dynamical system (xf )n=1,....N Coming from the forecast 
step, and give as an output another set of members (x7),=1,....y. Each posterior 
ensemble member xý is obtained as a linear combination of the prior ensemble 
members (xy )n=1,....N in order to minimize the distance between the ensemble and 
the observation in some sense. 
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One important assumption of the classical EnKF is to consider that the observa- 
tion and model noise are uncorrelated. This observation-calibrated forecast could 
imply that the latter assumption no longer holds. Still, the discussion following 
Eq. (18) on the observation nature explains why we can consider the uncorrelation 
between the forecast and observation noise. If this assumption appears to be not 
valid, we refer to the work made in [10] to rigorously justify the introduction of 
an observation-dependent forecast. In this work, both Kalman and particle filter 
equations were rewritten in terms of the conditional expectation with respect to the 
underlying sequence of current and past observations. The stochastic simulations are 
run on a double-periodic simulation grid, Gs, of size 64 x 64 points and of physical 
size 1000km x 1000km, meaning that two neighbor points are approximately 
15 km apart. An observation is assumed to be available every day (i.e. every 600 
time steps of the dynamics) on a coarser observation grid, Go, which is a subset 
of Gs of size 16 x 16. It is generated as follows: a trajectory of buoyancy (z;); is 
run from the deterministic model (PDE) at a very fine resolution grid G f, of size 
512 x 512. Then a convolution-decimation procedure D is applied in order to fit 
to the targeted simulation grid Gs. It consists in the composition of a Gaussian 
filter and a decimation operator subsampling one pixel out of two. It has to be 
iterated three times in our case to fit the correct resolution. This is done in order 
to respect Shannon’s theorem and to avoid spectrum folding. A projection operator 
P is applied from Gs to Go, and we finally add an observation noise to get the 
observation 


b(t) = Po D(z) +m ; m ~N(0, R) and R=r7ly, (26) 


where R is the diagonal observation covariance matrix and M is the number of 
points on the observation grid. 


Numerical Setup The simulations have been performed with a pseudo-spectral 
code in space (see [6] for details). The time-scheme is a fourth-order Runge-Kutta 
scheme for the deterministic PDE, and an Euler-Maruyama scheme for the SPDEs. 
We use a standard hyperviscosity model to dissipate the energy at the resolution 
cut-off with a hyperviscosity coefficient 6 = (5 x 10°? m®.s~!)M> 8 where M, is 
the grid resolution [6]. 


The test case considered in this study is the following: an ensemble of N = 
100 ensemble members is started from the very same initial condition at day 0, 
which consists in two cold vortices to the north and two warm vortices to the south. 
However, the amplitude of the initial vortices is underestimated compared to the 
initial condition used for the deterministic run (considered as the truth) by 20%, as 
shown in Fig. 1. We refer to [2] for a mathematical expression of this field. 

In this experiment, we study the differences of efficiency of the localized 
Ensemble Square Root Filter (an up-to-date version of the Ensemble Kalman filter, 
see for instance [11] for details of the square root filters (ESRF) and [12] for a 
description of the observation covariance localization procedure) with both noise- 
calibrated forecast and classical stochastic simulations. We also refer to [13] for the 
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Fig. 1 Initial conditions for the truth (on the left) and for each stochastic run (on the right, common 


to all ensemble members). We enforce an underestimation of the amplitude of the initial vortices 
of 20% 


extension of the square root filter for additive forecast noise based on covariance 
transformation, where the advantages of additional model error in the forecast step 
are shown. 

In both cases, starting from the underestimated initial condition, the stochastic 
dynamics is simulated using the POD noise with K = 10 modes. An observation 
is provided each day (i.e. every 600 time steps of the SPDE), with an observation 
error covariance set to r = 1075 in (26), which corresponds to a weak (but not 
negligible, 1% of the maximum amplitude in the initial buoyancy field) noise on the 
observation. The localization radius is set to lops here, where lobs ~ 60 km denotes 
the distance between two neighboring observational sites, as it provided the best 
results for both cases. 

The typical behaviour of the vortices, at least at the beginning of the simulation, 
is to spin with no translation of the cores. In our case, the true vortices will spin 
much faster than those in the biased stochastic runs. The goal of calibration is then 
to speed these vortices up in order to get them closer to the truth. 

The forecast is calibrated at each time step of the SPDE, using the upcoming 
observation to do it. Multiple parameters were tried for the regularization parameter 
a, or alternatively for the upper bound allowed for the L?-norm of the Girsanov 
drift vr. Figure 2 compares the MSE along time for all the range of parameters 
tested here, with also the same experiment without noise calibration. For this latter, 
the LESRF has a difficult task, as it tries to find linear combinations of the prior 
ensemble members, which all have an underestimated velocity, to get closer to the 
observation. This is a general issue for ensemble methods (as well as for particle 
filters), which are not able and designed to correct the bias if this correction is not 
made in the forecast. By contrast, the LU calibration offers an additional degree of 
freedom to guide the ensemble towards the observation. This procedure significantly 
improves the results in terms of MSE. At day 13, when the MSE is maximal for 
the usual case, we observe an improvement from 85% to 93% depending on the 
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Fig. 2 Comparison of MSE along time between the non calibrated forecast (in black) and all the 
different parameters tested here for the noise calibration. The snapshots shown in Fig. 3 are taken 
at day 15 (black dashed line) 


parameters tested. The case of the underestimation is an example, but we expect 
this procedure to be efficient in any situation in which all ensemble members 
have a similar problem of bias, bad amplitude estimation, artefacts, unsymmetrical 
features, etc. With a reasonably small ensemble size, which is generally the case in 
practice, this is likely to occur if the initial conditions have such features. 

As explained previously, the regularization term «œ controls the amplitude of the 
allowed correction drift. In our experiments, all parameters tested yield significant 
improvements compared to the classical case, still a good trade-off seems to be 
found with a control of ||vr ||2 between 70 and 150. Starting from 150, we observe 
higher MSE in the very first days, certainly due to a lack of constraint on the 
inverse problem. In addition to the MSE results, we show in Fig. 3 a more visual 
example of what calibration does. At day 15, the configuration of the truth is that all 
four vortices are horizontal. Without calibration (first row), the vortices are slanted 
because of the initial underestimation of the velocity. The velocity field has not been 
properly corrected. On the other hand, the LU calibration offers a more reliable 
prediction, as we recovered the global shape of the vortices, with additional spread 
around the mean. 

Finally, we show in Fig. 4 an insight of how the Girsanov correction vr behaves 
in time. As the structure of the noise is stationary, so is the structure of vr because 
it relies on the same modes as the noise. What is interesting is the evolution of the 
amplitude of this field, which decreases in time, meaning that most of the calibration 
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Fig. 3 Comparison between the ensemble mean (left) and the ensemble standard deviation (right) 
maps, with and without calibration, at day 15 with the high-resolution truth 


work is done in the very first days of simulation, and once the forecast manages 


to get closer to the truth, the need for calibration is less crucial and the Girsanov 
correction gets weaker. 
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Fig. 4 Vorticity of the Girsanov drift vr computed for one ensemble member at the first time step 
after the initial condition (left) and at the first time step after day 17 (right) 


5 Conclusion 


The findings of this paper show the ability of a data-driven noise calibration 
procedure to improve significantly the assimilation by EnKF of a system initialized 
with an underestimated initial condition. 

As already mentioned in Sect. 2, we intend to extend this setting to non-stationary 
noises, as they were shown to be associated to a better quantification of the 
uncertainty (see [6] for details). Regarding computational effort, the calibration 
procedure is intrinsically paralellizable ensemble-wise, and the techniques used are 
close to optical flow estimation procedures, for which efficient solutions exist. The 
tuning step of œ is the more expensive step for now, for which more sophisticated 
methods could be envisaged. 
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A Two-Step Numerical Scheme in Time ® 
for Surface Quasi Geostrophic Equations wss 
Under Location Uncertainty 


Camilla Fiorini, Pierre-Marie Boulvard, Long Li, and Etienne Mémin 


Abstract In this work we consider the surface quasi-geostrophic (SQG) system 
under location uncertainty (LU) and propose a Milstein-type scheme for these 
equations, which is then used in a multi-step method. The SQG system considered 
here consists of one stochastic partial differential equation, which models the 
stochastic transport of the buoyancy, and a linear operator linking the velocity and 
the buoyancy. In the LU setting, the Euler-Maruyama scheme converges with weak 
order 1 and strong order 0.5. Our aim is to develop higher order schemes in time, 
based on a Milstein-type scheme in a multi-step framework. First we compared 
different kinds of Milstein schemes. The scheme with the best performance is 
then included in the two-step scheme. Finally, we show how our two-step scheme 
decreases the error in comparison to other multi-step schemes. 


1 Introduction 


The main aim of the modelling under location uncertainty (LU) consists in 
simulating on coarse meshes an enriched system mimicking a high resolution 
deterministic chaotic dynamics. Such LU models allow one to recover phenomena 
such as backscattering, dissipation and reorganisation on very coarse meshes. 
Furthermore, it provides a natural framework for uncertainty quantification analysis 
[14]. The LU framework, first introduced in [11], is based on the decomposition of 
the Lagrangian velocity into two components: a large-scale smooth component and 
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a small-scale fast oscillating one. This decomposition leads to a stochastic transport 
operator, and one can, in turn, develop the stochastic version of classical fluid- 
dynamics systems derived from the Navier-Stokes equations. SQG in particular 
consists of one stochastic partial differential equation (SPDE), which models the 
stochastic transport of the buoyancy, and a linear operator relating the velocity and 
the buoyancy: 


db, = 4V - (aVb,)dt — v* - Vb,dt — Vb; «0 dBy, 
bi = N(-A)'/?y, (1) 
u=V+y, 


where b; is the buoyancy at time f, u the large-scale smooth velocity, N a constant 
depending on the vertical oscillation frequency of the buoyancy and a Coriolis 
parameter, B a Wiener process, y the stream function and v* = u — iV -at+oV-o 
is a corrected velocity associated with the effect of the noise inhomogeneity on 
the advected variables. The spatial correlations of the noise are given through an 
integral kernel operator o (here assumed deterministic and symmetric for sake of 
simplicity), and the variance matrix, a, given by the matrix kernel of the operator 
oo provides a local measure of the noise strength. For more details on the derivation 
of this system, see [10, 13]. In the rest of this work we will mainly focus on the 
first equation, and the last two will be condensed in u = H(b). Concerning the 
modelling of the noise, we use the equivalent convenient spectral definition: 


odB, = ) | 9"dB;", 
m 


where 6” = 6" (t) are independent one-dimensional standard Brownian motions 
and g” = [9"", gmt (x) are basis functions. The number of terms involved in the 
sum is in theory infinite, but in numerical application a truncation is considered. In 
the definition of the numerical schemes we will thus assume that it is a finite sum. 
For the computation of the basis functions, two strategies are possible: an offline 
strategy, where they are defined from the eigenfunctions of an empirical covariance 
tensor built from high-resolution data as described in [10, 13]; of strategies, where 
the functions are updated during the simulation and in this case they are a function 
of the buoyancy b. With this representation, the variance tensor reads: 


a=) e"")". 


2 Numerical Schemes 


In this section we derive a two-step numerical scheme in time for the SQG system 
under LU (SQG-LU). We compare this scheme to other multi-step schemes for 
the SPDE, in particular the ones developed in [5] and [4], and show how our 
scheme improves the precision. Concerning discretisation in space, standard spectral 
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methods are used: the linear terms are treated in the Fourier space, whilst the 
nonlinear terms are discretised in the physical space. 

The derivation of the time scheme consists of two steps: first, we derive a class 
of Milstein schemes for SQG-LU and we empirically verify their convergence, then 
a two-step scheme is proposed. 


2.1 Derivation of a Milstein Scheme 


To design the Milstein schemes, we consider the integral form of the SPDE in (1), 
namely 


t 1 t 
bi = by +f (3y -(aVb;) — v* - vb.) ds -f X Vbs + gp" dp," (2) 
to t0 m 
and we can define the following functions: 
1 * m m 
f(b, t) = ay -(aVb,) — v* - Vb; and g (bi, t) = —V b; o”. (3) 


We can now use the functional extension of the Itô formula [3] for both f and g to 
write their differential forms: 


t9 t9 
Foant f oF oy sae f T te cana, 
to OS to OD 
: (4) 
tf The )d(b, b) 
2 ig ab2 s, S 3 S 
t 3g” t 3g” 
g” (br, t) = 8" (bn, to) + f E uit f BE iai 
tn OS t OD a 


1 t 32 m 
+5 1 TS (bs, s)d(b, b)s 
to 


We remark that, since the basis g” is constant in time then so is a and the functions 
f and g’ do not depend explicitly on time, therefore df/dt = dg” /dt = 0. 


Concerning the first derivatives with respect to b, it has to be interpreted as a 
Fréchet derivative. The Fréchet derivative of an operator F is the bounded linear 
operator D F(x) which satisfies the following relation: 


_ |F@+h)— FO) - DF@)Al| 
lim = 0, 
lal —>0 lA || 


(6) 
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which implies that for a linear operator DF (x)h = F (h). We start for g and use the 
fact that V is a linear operator: 
0 o m 


zb (7) 


0g — 
—(b)b = —Vb-g" —WVb- 
ap © 9 
If the basis is computed offline, g” does not depend on b and therefore the second 
term in (7) is zero. If the basis is computed online and gy” does depend on b, we can 


rewrite the second term of the sum by components and, using the chain rule, one 
has: 


dp” <dbdag™ abagy 
Vb. = x > = V.o". 8 
ab ax ab dy ob a 8) 


For the second term of f, i.e. v* - Vb, the same considerations are valid. To 
compute the derivative of the first term of f, we remark that it is a composition and 
product of three operators, two of which are linear. We can define: 


1 
Fi(h)=>V:h, Ma), F3(b)= Vb. (9) 
Using the chain rule and the linearity of Fı and F3 one has: 


D( Fi (F2(b)F3(b)) )b = DF: (F2(b) Fs(b)) (D F(b) F(b) + F(b) DF3(b))b 


F\ (F3(b) D Fo(b)b + F2(b) F3(b)) 


i 
ly. (? yb 4 avb 
2° Nab 


(10) 
Finally, with the same considerations used above, we remark that we can write 
(da/0b)Vb = V -a. Therefore: 


Sepa foeyeiv-vea—-v-vt, 28" b=") -V-9". AD 
apes 2 * Jb =e ans 


As for the It6 covariation bracket, one has: 


=f ye (bs, s)A8”, [ Le (br, t)dB), =f (Leos) a 


m 


We now suppose to be in either one of the following cases: 


— the basis functions g” (and therefore a) do not depend on b and V - v* = 0, 
— the basis functions ø” depend on b but are such that V-v* = V -V -a = V-o = 
V -ọ” =0. 
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It can be noticed that the first case corresponds to a noise defined from external 
high-resolution data (and thus that does not depend on the solution) while the 
second case boils down to impose an incompressibility condition constraint on the 
large scale component, V - u = O, that is indeed often considered in practice with 
particular scaling of the noise [1, 2]. With these assumptions, we have then: 

a f 32 f a g m 32 g” 
ðb Əb? | ðb Sb? 


= g” . (12) 


We can now replace all these expressions into (4) and (5), and then (4) and (5) into 
(2). Keeping only the terms of order one or lower, we obtain: 


t S 
by = by + f (bn) At +Y 8” (bn) Ap" + f / XO e” (s*(e))dBE dB", — (13) 
m to “to 


m,k 


where At = ¢ — tọ and AB” = B;" — Pp- We define the following quantities: 


t sS 
gic e aptapr, 
to Y to 
then the double iterated Itô integral in (13) can be approximated as follows: 


pmk 4 gem pk a pem 
Gk pk = Gmk +4 gmk ; 
2, 2 = 


m,k m,k 


The first symmetric term can be computed analytically from Itô integration by part 
formulae, 7™k + 5” = Ap” Ap* — ôm,k At, however the second antisymmetric 
term (17* — ye") /2 =: Ane cannot and it is known as the Lévy area. 


2.1.1 Lévy Area Simulation 


In this subsection, we briefly introduce the methods we used to simulate the Lévy 
area. More details can be found in [6, 8], where these methods were proposed. The 
first method to simulate the Lévy area will be referred to as the weak approximation 
in the rest of this work: in this method, we simulate a random variable that has the 
same moments as the Lévy area. The second method, which will be referred to as 
the conditional method, is a recursive method: the time interval (tọ, t) is recursively 
split into two subintervals of the same length, and the two following relations are 
used: 


1 
Ant = ARE + Aur + 5( (Br -BEF -BD - BE - BEYER - BY) 04 
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lAn, lB; — Bo] = 0. 


For more details on these two methods, see [7]. Finally, we consider a third 
approach, where we neglect the Lévy area. We remark that this approach is exact if 
G™k = G*™ which is not the case here. 


2.2 Multi-Step Schemes 


We next propose a two-step scheme in which the Milstein method is used as the 
prediction step and the Euler method is adopted as the correction step, it reads: 


bf = by + f (bn: Mm) At + E 8" (bn) AB” + D OM (Sik + An) 
m 


m,k 
us = Hb*) 
by = bbn + (bf + SOF. upar + D> g” Oae") 
m 


(15) 
where aug := (AB™ AB* — ôm ,At)/2 and Ape is one of the approximations of 
the Lévy area described in the previous subsection. This scheme will be referred to 
as SRK2-EM (EM stands for Euler-Milstein not for Euler-Maruyama) in the rest of 
the paper. 

In the next section, we first analyse the results of the Milstein schemes with the 
different Lévy area approximations in order to select the best one. Then, we compare 
our multi-step scheme to two other multi-step schemes developed in [5] and [4]. We 
briefly recall them here. The first one, based on a third order Runge-Kutta scheme, 
(SSPRK3) [5], is: 


bD = by + fs br, Uy) At ae X g” (by) AB” 
m 
u) = H(b) 
bO = 3b, +4 (0 + f(b, uD) At + = «(66 (16) 
m 
u® = H(b®) 
bi = ibo +4 (0 + f(b, wu) At + 3 g" oap") 
where fs = f —V-(aVb)/2 denotes the modified drift under Stratonovich integral. 


The second one, relies on Euler-Heun method [4] equally for Stratonovich integral, 
reads: 
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bY = Dig + fs (bio, Uy) At + 2 g” (bin Ap" 

m 
u) = HO) os 
bi = iby +4 (o FROD, uPA +E g" oap") 


3 Numerical Results 


In this section we show some numerical results. First, the effect of the different 
approximations of the Lévy area is studied on the Milstein scheme. Then, the multi- 
step scheme is assessed and compared to the ones already proposed in the literature. 
We focus on two variations of one specific test case plotted in Fig. 1: the initial 
condition (left) consists of two warm elliptical anticyclones on the bottom of the 
domain and two cold elliptical cyclones on the top. After one day under moderate 
noise (centre), the four structures have rotated of approximately 45°. After one day 
under strong noise (right) the nonlinearity of the dynamic is more noticeable. One 
can find all the configuration details used for these simulations in Chapter 6 of [10] 
for the moderate noise configuration. For the strong noise, all the basis functions g” 
are multiplied by a factor 10. 
We will use the following abbreviations for the different numerical schemes 


— Euler: Euler-Maruyama scheme. 

— Milstein-0: Milstein scheme without the Lévy area. 

— Milstein-weak: Milstein scheme with the weak approximation of the Lévy area. 

— Milstein-cond-n: Milstein scheme with the conditional approximation of the 
Lévy area. Here n stands for the number of times the interval is recursively split 
(cf. (14)). 

— SRK2-EM: scheme (15) with Aye = 0. 

— SSPRK3: scheme (16). 

— Heun: scheme (17). 


One realization One realization One realization 
<10° t = 0 day 10° t= 1 day 10° <10% t= 1 day x 10° 
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Fig. 1 Euler-Maruyama simulation of system (1) on a 128 x 128 spatial grid 
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Fig. 2 RMSE (normalised by the amplitude of buoyancy By = 107? m/s?) of different schemes 
during 30 days of simulation under moderate noise 
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Fig. 3 Convergence of different schemes under weak and strong noise. Order 1 in dotted black, 
order 0.5 in dashed black 


In Figs. 2 and 3 one can see the difference among the Euler-Maruyama scheme 
and all the Milstein schemes proposed. In Fig. 2 we plot for each scheme for a period 
of 30 day the root mean squared error (RMSE), defined as: 


1 2 72 
RMSE = z] ler - blizo ; (18) 


where {2 denotes the spatial domain, bp is the numerical solution of stochastic 
system (1), and b stands for the reference solution downsampled from a high- 
resolution deterministic simulation (recall that the aim of the stochastic setting 
is to reproduce on coarse grid high-resolution deterministic simulations). The 
downsampling procedure consists of a first low-pass filtering performed in the 
Fourier domain and a subsequent subsampling operation. The expectations are 
estimated from 30 of realization. These results are obtained with a Af twice as 
small for the Euler scheme with respect to the other schemes. One can observe that 
Milstein-0 performs slightly better than the other Milstein schemes. 
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In Fig. 3, we show the rate of strong convergence y of all the schemes discussed, 
under weak and strong noise. Since the exact solution is unknown, we use the 
following method [15] to estimate y, for a sufficiently small Ar: 

2 1/2 
na 


z el ; e At At 
y= log, es A with ei := 1 bn ie zi-1 == bh T, Din 


where bp(T, At) is the numerical solution at the final time T obtained with a 
time step Ar. It is important to underline that in order for this method to work, 
the Brownian trajectories must be fixed. We applied this method for time steps 
30, 60, 120, 240, hence obtaining two estimates for y. Is is important to remark 
that the value of the time steps is given in seconds and the time-scale of the studied 
phenomenon is of the order of one day. For reference, the CFL condition for this 
problem at the initial time would give a time step around 300s. The smallest time 
step we considered to obtain this estimate is ten times smaller than this. As one can 
see from Fig. 3, under weak noise all the one-step schemes provide almost identical 
results and all the multi-step schemes are very similar. It is hard to distinguish among 
the different numerical schemes proposed. In particular, for the considered span of 
time steps, the error of the Euler scheme under moderate noise displays a linear 
trend and the prevailing convergence order in this case is one. The reason of that is 
explained in Appendix. 

Under strong noise, it is easier to see the differences among the schemes. 
Milstein-weak is a slight improvement on the Euler-Maruyama, but its rate of 
convergence is far from 1. Milstein-O0 has the highest rate of convergence among 
all the schemes. 

In conclusion, Milstein-0 seem to perform better than the other Milstein schemes. 
Furthermore, it is less computationally demanding. For these reasons, we built our 
two-step scheme based on Milstein-0. 

In Fig. 3 we also compare the multi-step schemes mentioned above: they all have 
a similar behaviour, with a rate of convergence 0.5 < y < 1, but a much smaller 
error when compared to the one-step schemes. In particular, the two-step scheme 
proposed in this work (SRK2-EM in the figures) yields the smallest error of all for 
this test case. The SRK2-EM schemes also yields the smallest RMSE (cf. Fig. 2). 


4 Conclusion and Perspectives 


The Milstein schemes analysed in this work improve the numerical results, in 
particular when used in a multi-step framework. The Lévy area does not seem to play 
a key role in these test cases, which allows us to drastically reduce the computational 
costs. It must be pointed out that under weak noise, all the schemes tested provide 
very similar results. Some ongoing and future work include the understanding of the 
(non) importance of the Lévy area and whether this is related to the test case, the 
equations, or other factors. 
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Appendix: Convergence of Euler-Maruyama Scheme Under 
Moderate Noise 


To study the behaviour of our system under moderate noise, we use the formalism 
of [12]; in particular, we write our system in the following generic form: 


dX; = a(x, t)dt + €b(x, t)dW; + €°c(x, t)dt, t €[0,T] (19) 


with a, b, c, being jointly L?-measurable in (x, t), Lipschitz, bounded linear-growth 
functions in x. 

Let Y? be an Euler-Maruyama integration scheme for X. with integration step ô. 
Then we may prove in a similar fashion to theorem 4.5.4 in [9] that: 


1. E[X;]? < C, Vee [0,7] 
i [IXa — YP, gl|Xr40 = x] < KŒ + VENS + 0(8)). 

Using this and the Lipschitziannity of the coefficients in (19), we may prove a 
result, to some extent similar to theorem 2.1 in [12], namely that 


z| sup IX: — YPI Xn =x < K'(x)(8 + Je/5 + 0(8)). (20) 


to<t<T 


In light of this estimate, we may interpret the convergence rate displayed in Fig. 3 
as a case where 6 is not small enough when compared to € so that ,/é./8 does not 
necessarily prevail over ô which is evidenced by the linear rate of convergence. 
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The Dissipation Properties of Transport A 
Noise Cres fr 


Franco Flandoli and Eliseo Luongo 


Abstract The aim of this work is to present, in a compact way, the latest results 
about the dissipation properties of transport noise in fluid mechanics. Starting from 
the reasons why transport noise is natural in a passive scalar equation for the heat 
diffusion and transport, several results about enhanced dissipation due to the noise 
are presented. Rigorous statements are matched with numerical experiments in order 
to understand that the sufficient conditions stated are not yet optimal but give a first 
useful indication. 


Keywords Dissipation by noise - Turbulence - Eddy diffusion - Vortex patch - 
Transport noise - Dirichlet boundary condition 


1 Introduction 


In the last four years, a new understanding of heat diffusion in a turbulent fluid 
modeled by white noise has been developed. This model has the interesting feature 
of describing properly the dissipation properties of a turbulent fluid. The equation 
for the heat diffusion and transport, with a heat source q, is 


0,06 +u-VO=KAd+q (1) 


where 0 = 0 (t, x) is the temperature, « is the diffusion constant and u = u (t, x) is 
the velocity field of the fluid. The turbulent fluid is a priori described by a random 
field, Gaussian and white in time, with covariance structure given a priori (hence the 
temperature is a passive scalar). In this review we consider the following description 
for u: 
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dwk 
u(t,x) = 5 og (x) ae 


keK 


(2) 


where ox are divergence free vector fields satisfying no slip-boundary condi- 
tions and wk are independent Brownian motions on a filtered probability space 
(2, F,(Fidiso, P); for simplicity, assume K is a finite set, but the case of a 
countable set can be studied without troubles at the price of additional summability 
assumptions. Some rigorous justification for describing the velocity of a turbulent 
fluid by Eq. (2) are available in Sect. 2.2. Here we want just give some ideas. Let us 
denote by u™® the solution in a domain with boundary D of the SPDE 


duh + Vp? = vAu”? — uve ae 2 oe oKd,d Wy 
div (u”®) =0 (3) 


ulap = 0, 


where the terms — ture + 1 X reg Ord WE describe the roughness of the boundary 


as stated in Sect. 2.2. Let, moreover, wr’ = fe u”*(s) ds, then it can be proven 
than 


ae ence n — doo Wiz | SU, 


E= 
keK 


see for example [6]. 
The correct interpretation of Eq. (1) when u has the form (2) is the Stratonovich 
equation 


do + Y` or: VO o dWF = (KAO +q)dt (4) 
keK 


or equivalently the Itô equation with corrector £0 given by the second order 
differential operator (7) below: 


do + Y` ok- VOdW;s = (KAO + LO +q) dt. (5) 
keK 


There are some motivations for the analysis of Eq. (4) based on the idea to extend to 
SPDE the remarkable principle of Wong-Zakai [20], see for example [2, 3, 14, 15, 
18, 19, 16]. 

Assuming that the external source q and the initial temperature 6) are determin- 
istic, under suitable mild assumptions the deterministic function 


O (t, x) = E [0 (t, x)] 
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is the solution of the deterministic parabolic equation 


2O =(kKA+L)O+¢q (6) 


where E denotes the mathematical expectation on (2, F, P). The main results in 
the last years are quantitative estimates on the difference 6 — ©, some convergence 
properties of the solution of Eq. (5) to the stationary solution of Eq. (6) and the 
enhanced dissipative properties of the second order differential operator « A + £, 
see [7, 9, 10]. These kinds of results explained properly the dissipation properties of 
transport noise and are the core of this review article. 

In Sect. 2 we will present some motivations for the analysis of Eq. (4) as a 
good model for the heat diffusion in a turbulent fluid and we will introduce the 
main notations. In Sect. 3 we will present the main results, referring to [7, 9, 10] 
for some rigorous proofs. Lastly, in Sect. 4 we will present some cases where 
the coefficients og introduce more dissipation in the model with respect to the 
theoretical predictions made by the rigorous sufficient conditions, exploiting real 
computations or numerical simulations following the ideas of [7, 10]. 


Remark I In this review we only considered the effects of the transport noise on 
passive scalars. Actually, some results can be stated also for the scalar vorticity of 
the fluid itself, in two space dimensions. We refer to [8, 11, 12] for further readings. 
The case of the influence on vector fields is much more difficult and still to be 
understood. 


2 Well-Posedness and Motivations 


2.1 Notations and Definitions 


In this review we will denote by D a 2D domain with boundary, either a smooth 
bounded open set or an infinite 2D channel, namely R x (—1, 1). We write the 
coordinates using the notation 


x = (x1,z) E€ D. 
Let Z be a separable Hilbert space, denote by Fas Z) the space of square 
integrable random variables with values in Z, measurable with respect to Fj). 


Moreover, denote by Cz ([0, T]; Z) the space of continuous adapted processes 
(X1)+efo,7] With values in Z such that 


| sup x| < œ 
te[0,T] 
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and by L- (0, T; Z) the space of progressively measurable processes (X+)+<[0,7] 


with values in Z such that 
T 
‘|| IX a <0. 
0 


Denote by L? (D) and W*:? (D) the usual Lebesgue and Sobolev spaces and by 
wè A (D) the closure in W*:? (D) of smooth compact support functions. Set H = 
L? (D), V = WÈ? (D), D(A) = W? (D) N V. We denote by (-,-) and |-| the 
inner product and the norm in H respectively. 

Assume that K is a finite set and og € (D (A) N Cc (D))* , V-og=0,ke K 
(less is sufficient but we do not stress this level of generality). Define the matrix- 
valued function 


Q, y) = J oaa) 8 ok O). 


keK 


If we denote by W (t, x) the vector valued random field 


W (t,x) = Do ox (x) Wh 


keK 


(the velocity field u given by (2) is the distributional time derivative of W) then we 
see that Q (x, y) is the space-covariance of W (1, x): 


QO(,y)=E(Wd,x)@Wd,y)]. 


The matrix-function Q (x, x) is elliptic: 


d 
D 2i @, x) 88; = E[IW x) -€P] 20 


i,j=l 
for all € = (&1,..., Eq) € R7. Associated to it define the bounded linear operator 
Q: L?(D; R?) > L7(D;R’), (Qv) (@) = Í, Q(x, y)u(y) dy 


and the quantities: 


ETO(x, x)é 
ER 


TO 1/22 
EQ m IQ / IP 2(D2R2) > LADRI" 


q(x) = mino 
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Consider the divergence form elliptic operator £ defined as 


d 


1 
(£0) @) = 5 Y= a (Qij œ, x) 3j0 (x)) (7) 


i,j=l 
for 6 € W>? (D). Define the linear operator A: D(A) C H > Has 
A0 =(kKA+L)86. 


It is the infinitesimal generator of an analytic semigroup of negative type, see [1, 4, 
13, 17], that we denote by e’ A t > 0. Moreover, if D is bounded, we denote by Kr 
the first eigenvalue of —x A and by «i ¢ the first eigenvalue of —(k A + £). 


Definition 1 Given 0) € L?(Fo, H) and q € L?(0, T; H), a stochastic process 
0 € CF ((0, T]; H) N L} (0, T; V) 


is a mild solution of Eq. (5) if the following identity holds 


t t 
0 (t) = elo + | ef 949 (s) ds — py ef 94 Gg, . VO (s) dw 
9 kek °° 


for every t € [0, T], P-a.s. 


Theorem 1 For every 0 € L? (Fo, H) andq € L? (0, T; H) there exists a unique 
0 mild solution of Eq. (5). Moreover 0 depends continuously on 6o and q. 


Definition 2 Given 0) € L? (Fo, H) and q € L? (0, T; H), we say that a stochastic 
process 0 is a weak solution of Eq. (5) if 


6 € CF([0, T]; H) N L30, T; V) 
and for every ¢ € D(A), we have 


(OC), p) = (80, 6) + fy (O(s), Ad) ds + f5 la(s), Ø) 
+X zeg Jo (0 (8), ok - Vb) awk 
for every t € [0, T], P — a.s. 


Theorem 2 6 is a weak solution of problem (5) if and only if is a mild solution of 
problem (5). Moreover the Itô formula 


IAO? — 181? = 2 f (00s), a68) ds + eek follor YOC)? ds 


-2 f' ((—A)20(s), (—A)20(s)) ds 


holds. 
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These results are classical and can be found in [5, 10] together with several 
generalizations. 


2.2 Motivations 


In this section we want to give some heuristics to accept Eq. (4) as a correct model 
for heat diffusion in a turbulent fluid. In the domain D we have a fluid with velocity 
u (pressure p, constant density = 1) and the heat 6. Both u and 6 are equal to zero 
on ð D: 


ulap =0 
8lad = 0. 
The condition uļəp = O provokes several interesting technical questions. The 


equations are 


juU+u-Vu+Vp= f 


V-u=0 

0+u-VO=KAO+q (8) 
ulr—0 = Uo 
O|:=0 = 8o. 


where f and q take care of interaction with external sources. In particular, physical 
boundaries are never completely smooth. Hence, the external source f want to 
model the effects of the roughness of the boundary and its influence to the velocity 
of the fluid. The instability of the flow at the boundary, originating vortices, is very 
strong, hence the frequency and intensity of creation of vortices at the boundary 
strongly suffers from the imprecision of the description of the true boundary. 
Replacing the true details of the boundary by a random mechanism of vorticity 
production would increase the realism of the model. Emergence of vortices near 
obstacles is commonly observed and we content ourselves with an ad hoc inclusion 
of this fact into the equations. Assume the velocity field at time t is u(t, x). Assume 
that, as a consequence of an instability near the boundary, a modification occurs and 
in a very short time we have a field u(t + At, x) which is not just equal to the smooth 
evolution of u(t, x). We may assume that at some time t we have a jump: 


u(t + At, x) =u(t,x)+0o(x) 
where o (x) is presumably localized in space and corresponds to a vortex structure. 


After these preliminary comments we can accept to model the roughness via a 
friction term of intensity — and a term of jump described by 
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k,1 NE 


og(x) N24 /62 T ANN 2/02 
W(t, x) = be ’ 
keK N v2 


where N, are independent Poisson processes. More on this topic can be found in 
[6]. Applying a Donsker invariance principle to the stochastic process Wy (t, x), it 
converges in law to the gaussian process 


1 
Wie) = = DP Ww, 
keK 


where wk are independent Brownian motions. Parameterizing the solutions of 
system (8) by £ we arrive to the following stochastic coupled system 


jue +u? - Vu? + Vp? = -t (uè — 0,W) 


V-u’ =0 

0,0° +u? -VO = KAO’ +q (9) 
Uul;—0 = uo 
O\:-0 = 90. 


The last step for moving from system (9) to Eq. (4) is trying to understand the 
behavior of system (9) letting € — 0 and it based on a result proved in [12] in the 
case of the 2D torus and under analysis in the case of general 2D domains with 
boundary. Thus just for the last sentence of this subsection we assume the D = 
T? := R?/(20Z”). 


Theorem 3 Under previous assumptions on q and o, if moreover: 


— the coefficients og are zero-mean and there exists | > 1 such that 
on € W' Wk € K; 

— Wx € T? it holds Yen ((K * ox) - Vox) (x) = 0; 

- q € L! ([0, T]; L® (T?)); 

—- O€ L” (T?) ‘ 


then for every f € L! (T?) 


al (68 — 6,)(x) f(x) dx 
T2 


|>0 ase — 0 


for every fixed t € [0, T] and in LP ([0, T]) for every finite p. Moreover, if q € 
L! ([0, T]; Lip(T>) then the previous convergence holds uniformly for t € [0,T] 
and f € Lip(T?) with [fliper < 1 and || fllo < 1. 


76 F. Flandoli and E. Luongo 
3 Main Results 


The results related to the analysis of these equations can be classified in three 
categories: 


1. Convergence of the solution of Eq. (5) to some quantities related to Eq. (6). 
2. Quantification of the dissipation of the function E [i0 (t) [7]. 
3. Enhanced dissipative properties of the second order differential operator k A+ £. 


Remark 2 Even if Q is a covariance operator, the third question is far to be trivial. 
In fact we assumed that oz|ap = 0. Thus the operator £ degenerates at the boundary. 


We will treat all the three problems above, sometimes specializing our general 
framework. 


Theorem 4 Assume D is a bounded domain. 


1. If 0) € L?(Fo, H), q = 0. Then, Yọ € L®(D), 


i [0.00 - OM] < E [1001P] lolze. 


K 


2. Moreover, if 09 > 0 


[10o] < (2 + 2101) E [ieot]. 


Remark 3 A result similar to the first item can be proved also in the case of D 
infinite channel and q ¥ 0 adapting the proof of Theorem 7 in [10] to such finite 
time case. 


Thanks to previous theorem is evident that the dissipation properties of the solution 

of the stochastic Eq. (5) are influenced obviously by the first eigenvalue of the 

operator £ but also by the operatorial norm of Q!/*. Thus, our next step will be 

state state some sufficient conditions in order to have €g very small and kKA¢ > KA. 
For ô > 0 fixed, let us define 


Ds := {x € D: dist(x, dD) > ô}. 


Then the following theorems hold. 


Theorem 5 Assume that the family of coefficients (ox (-))pex has the following 
approximate orthogonality property: there exists a finite number M € N and a 
partition K = Kı U ... U Ky such that 


(ox, on) = O forall k, k’ € Ki 
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foralli =1,...,M. Then 


2 
£o < M sup |lox ||". 
keK 


Theorem 6 Assuming that G(x) > o? in Ds, then for any x > 0 fixed 


lim KAg = +00. 
(c,5)—> (+00,0) 


Theorem 7 There exists a constant Cp > 0 such that 
kàc > Cp min (o, =) 
for every Q such that 
q(x) = o? in Ds. 


When D is the unit ball, asymptotically as 8 — 0, one can take Cp = 1 and 


KAc = eg 
ae Se 

From the last two theorems we understand that the dissipation properties enhance if 
€Q is very small and q(x) is very large except for a small boundary layer around 3 D. 
Obviously £ọ is related to the operatorial norm of Q!/? and thus, loosely speaking, 
is related to the operatorial norm of Q. Instead g(x) is related to the trace of Q, i.e. 


TQ) = f Troa) dx, 


Consequently we want that the operatorial norm of Q is small and the trace of Q 
is arbitrarily large and, possibly, infinity. Hence the existence of such operators Q 
which increase the dissipativity properties of the equation is not surprising. The last 
issue related to this topic is the presentation of an operator Q which has a fluid 
dynamics interpretation and satisfies previous property. This definition for general 
domain D is a bit implicit. Thus in the last section we will present some more 
explicit computations. 

Let us fix a parameter £ such that 0 < £ < ô , consider a smooth probability 
density function YW : R? — R with compact support in B(0, 1) and let us denote by 
K(x, y) the Biot-Savart kernel in D. We recall that a point vortex in xo has vorticity 


ôx, and smoothing it by W(x) := aw (+), then it has vorticity aw (==). 
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Now let us consider a random variable Xo distributed uniformly on D25, a real 
random variable J such that 


n= = E[ TG] <00 
and set 

w(x) =o f K (3, 9) 8xly ~ Xoddy 
=: IK; (x, Xo). 


If we consider in Eq. (5) the Brownian motion W (t, x), with covariance operator 


Qe (x, y) = gE [Ke (x, Xo) Q Ke(y, Xo)] 


one has 


vT O¢ (x,x)u = 62 t [IKe(x, Xo) - vl? for v € RÊ 


2 
(Qw, w) = eg D) Y w (x) - Ke(x, Xoax) l for w € L?(D; R3). 
D 


Inside the previous identities there is the key to have vf Q (x,x)v large and 
(Qw, w) small. Moreover, the law of u on the space of divergence free square 
integrable vector fields with null normal trace, heuristically, is a Poisson Point 
Process generating smoothed point vortices (and the associated velocity field) in 
random positions of D. Thus this kind of noise is reasonable for model what we 
expect from the heuristic analysis described in Sect. 2. 


Theorem 8 


— There exists a constant C > Q such that 
2 2 
(Qev, v) < Ceg lvli 


for every v € H and £ > Q. 
— For every x € D, let qe (x) > 0 be the largest number such that 


v? Qe (x, x) v > qe (x) lvl? 
forall v € R?. Then 


lim inf = +00. 
Loo xE Dos u (a) i 
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In the last result of this section the presence of the external source q is crucial. 
Moreover we assume that q is independent of time and introduce the stationary 
solution of Eq. (6) 


Ost = =A lg, 
In fact we want to study the convergence of the solution of the stochastic Eq. (5) 
to Ost. 
Set 


Coo (00. 4) = sup E116 (1) 120] 
t>0 
Theorem 9 /f 6) € L?(Fo, D(A)) and q € D(A), then 


— Cæ (9,9) < œ. 
— Foreveryġ € H 


lim sup [|(6 0) — Osr. @)1?] < a l$l? Coo (0, 4) - 


too 


In order to be of interest for applications, this theorem requires two conditions: 


(1) that ¢g is small. 
(2) that Os; is significantly affected by the noise. 


Obviously if kA, >> KA then Oy; is significantly affected by the noise. Thus we 
reconduct ourselves to the previous framework already treated. In Sect. 4 we will 
show a concrete example where this phenomenon appears. 


4 Explicit Computations 


Theorem 8 is not completely suitable for numerical simulations because the 
definition of K(x, y) is not explicitly available for every domain smooth and 
bounded. In this section we will present an explicit construction with a fluid 
dynamics interpretation, again based on vortex structures, which satisfies both eg 
arbitrarily small and q arbitrarily large outside a boundary layer. Moreover we will 
show numerically that, even relaxing the conditions in this construction, the noise 
influences the behavior of the stationary solution. 
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4.1 Explicit Construction 
We will construct a noise of the form I peg uk (x) d Wi with 


—1 X 
my (8) = wy (= XK), w= rw (>) 
for suitable r and w. Thus, the covariance of this noise is 


Qay =r? DV) wr @-DOu,(y—-2). 


ZEAN 


We need to choose xg, called the “centers” of the vortex blobs, and a suitable vector 
field w. The vector field w must satisfy several conditions: 


1. w is smooth and V - w = 0; 
2. w has compact support contained in B(0, 1); 


: 1 xt > 
3. w is close to Oar [xf near x = 0. 


The first two properties are useful in order to have that the u ;’s model the velocity of 
an incompressible fluid at rest. The third one is close to our idea of vortex structures. 

Now we choose the centers. For a fixed 6 > 0, we choose a positive integer N 
such that + < ô. Then we consider the set Ay of all points of Ds having coordinates 


of the form ($, i) with k, h € Z. Thanks to this choice we have 


1 
min |zı — z2| = —, min d (z, ƏD) > ô. 
wN 2| N ze An ( ) 


We choose another positive integer M and we decompose the set A y as the disjoint 
union of the sets 


M,ko,h 
AN = U AG 0,ho) 
(ko,ho)€{0, 1,....M—1}2 


where (£, 4) e AQ?" if k = Mn + ko, h = Mm + ho, with n, m € Z. In this 
way, we have 


min lz1 — Z2) = — 
M,kg,h 
zažzeaN toho) N 


for each (kọ, ho) € {0,1,..., M — 1}. We have introduced M and the sets 
Ann) in order to have that each couple of u; and ux in the same class have 
disjoint supports for r small enough and this is sufficient for our estimates, because 
it implies that the vector fields are “almost” orthogonal in the sense of Theorem 5. 
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(M,ko,ho) 
Ay 


In order to have the supports disjoint for elements of and the action of 


the noise covers the full set D25 we ask r < A. Now we can focus on the vector 
field w. In order of being divergence free we set w = Vt. Thus, we look for a 
smooth function y on R?, compactly supported in B (0, 1), close to + log |x| near 


x = 0. A possible construction is the following one: 


won =f woa- 0) dy 
R2 


where fs is a mollifier with support in B(0, £) and wo is a C (RÈ \ {0} radial 
function such that 
log |x| 


Wo(x) = = 


1 2 
for |x| < 3 and wo(x) =0 for |x| > 7 
Moreover, it can be proved that w defined above satisfies 
1 
lwl? <Clogz, [ley |? = w’. 


Thanks to these relations we can obtain, easily, an estimate of eg 


2 
J [eco a6.» voy axdy =r? 5 ([ me-a vwar) 


zEAn 
2 Wr (x — z) i 
= lwll T D 5y Tw OA 
12 M,kg.h L 
(ko.ho)€{0, 1,...,M—1L} zeal 0-') 
< M? wl? T? lvl? . 


Thus, taking £ = + we get 


£9 < M’I°ClogN 


which is small if, given N, I” is small enough. 

For what concern the analysis of a lower bound for g(x) in D25, the computations 
are a bit more involving and we refer to [7] for a complete discussion which is out 
of our scope. We just claim that if 


en 


r> M > 24, N is large enough 


N ’ 
then 


2 


q > 
q(x) 2 16x 


in D25. 
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4.2 Numerical Simulation 


Summing up the results of previous subsection, we have seen that if 


2 M 2% 2 12 1 < 1 Nis1 h 
r< —, r i r> —, e=— <-, is large enou 
=3 N N-6 Boerne 
we have 
2 72 5 rN . 
£o < MI’ ClogN, 40) Z Ter in D5. 


These conditions are strong from the numerical point of view: the cardinality of K 
must be very large and a finite but not small M is required. Certain supports have 
to overlap so that the noise acts everywhere. However in [10] it has been shown, 
numerically, that these conditions are overabundant and much less is required to 
see the influence of the noise on the solution, namely that Osr differs significantly 
from the parabolic profile even for relatively modest sets K and for M = 1. In this 
subsection we are working in an infinite 2D channel, suspend the requirement that 
q, © have to decay at infinity, although not strictly covered by the theory described 
in Sect. 2.1. We assume that the function g (x) is equal to a constant q. 
For numerical reasons we consider the problem in the bounded domain 


D = (tan(—1.54), tan(1.54)) x (—0.1, 0.1). 


In order to have that the o;,’s model a fluid at rest, we can take 
~ 1 
r < maxgexd(0D, xk) and € < g 


These are the real constraints on the parameters of our numerical simulation. The 
other parameters I’, K, {xkķ}keg can be chosen more arbitrarily in order to have 
satisfactory results. 

Differently from [10], here the vortex structures have not been chosen on a grid 
equally spaced in both directions. In particular the points thicken in the xı direction. 
We have chosen 2 points in the z direction between —0.05 and 0.05 and for what 
concern the x; direction we have chosen 2 points between 0 and 0.2, 4 points 
between 0.2 and 0.4 and 8 points between 0.4 and 0.6. In order to improve the 
smoothness of the solution, avoiding a shock in the number of vortices, we prefer to 
consider some few vortices for x; > 0.6. They only slightly affect the behavior 
of our solution in the critical region of interest xj < 0.5. Thus we consider 4 
points between 0.6 and 0.8 and 2 points between 0.8 and 1. Obviously we avoid 
repetition of the vortices. In conclusion we have 34 vortices. Moreover, we take 
r = 0.05, £ = 0.1 and F = 0.03. The other parameters of the problem are k = 0.05 
and q = 1. In this way the quantity M and N are not well defined and the impact 
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-0.05 


z-axis -0.1 


Fig. 1 Solution in the critical region 


of the operator £ is related to a small portion of the domain D, however we can 
completely appreciate how it changes the profile of the solution. 

Figures 1 and 2 illustrate the modification of the profile, from the standard 
parabolic one of free diffusion in a steady medium, to the case of turbulent decay. 
Even if we use just a really reduced number of vortices we can observe a significant 
decay modification of the profile due to turbulence where vortices thicken. 
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O15 
vy = 0 
0.09 + ——2,=0.1 
zı = 0.2 
0.08 —cG zı = 0.42 


-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 


Fig. 2 Profiles at different values of x; 
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Existence and Uniqueness of Maximal A 
Solutions to a 3D Navier-Stokes Equation èss 
with Stochastic Lie Transport 


Daniel Goodair 


Abstract We present here a criterion to conclude that an abstract SPDE possesses 
a unique maximal strong solution, which we apply to a three dimensional Stochastic 
Navier-Stokes Equation. Motivated by the work of Kato and Lai we ask that there is 
a comparable result here in the stochastic case whilst facilitating a variety of noise 
structures such as additive, multiplicative and transport. In particular our criterion 
is designed to fit viscous fluid dynamics models with Stochastic Advection by 
Lie Transport (SALT) as introduced in Holm (Proc R Soc A: Math Phys Eng Sci 
471(2176):20140963, 2015). Our application to the Incompressible Navier-Stokes 
equation matches the existence and uniqueness result of the deterministic theory. 
This short work summarises the results and announces two papers (Crisan et al., 
Existence and uniqueness of maximal strong solutions to nonlinear SPDEs with 
applications to viscous fluid models, in preparation; Crisan and Goodair, Analytical 
properties of a 3D stochastic Navier-Stokes equation, 2022, in preparation) which 
give the full details for the abstract well-posedness arguments and application to the 
Navier-Stokes Equation respectively. 


Keywords Stochastic transport - SPDE - Navier-Stokes - Well-posedness 


1 Introduction 


The theoretical analysis of fluid models perturbed by transport noise has been in 
significant demand since the release of the seminal works [16] and [17]. In the 
papers Holm and Mémin establish a new class of stochastic equations driven by 
transport noise which serve as much improved fluid dynamics models by adding 
uncertainty in the transport of the fluid parcels to reflect the unresolved scales. Here 
we consider the SALT [16] Navier-Stokes Equation given by 
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t t t t 
u -u+ | Luusds— f Aus ds + f Bu, oa, + | Vpsds =0 (1) 
0 0 0 0 


and supplemented with the divergence-free (incompressibility) and zero-average 
conditions on the three dimensional torus T?. The equation is presented here in 
velocity form where u represents the fluid velocity, p the pressure, £ is the mapping 
corresponding to the nonlinear term, W is a cylindrical Brownian Motion and B 
is the relevant transport operator defined with respect to a collection of functions 
(£;i) which physically represent spatial correlations. The explicit meaning of these 
conditions and the definitions of the operators involved are given at the beginning 
of Sect. 2.2. These (&;) can be determined at coarse-grain resolutions from finely 
resolved numerical simulations, and mathematically are derived as eigenvectors of 
a velocity-velocity correlation matrix (see [3, 4, 5]). The corresponding stochastic 
Euler equation was derived in [12] and the viscous term plays no additional role in 
the stochastic derivation (without loss of generality we set the viscosity coefficient 
to be 1). 

There has been limited progress in proving well-posedness for this class of 
equations: Crisan, Flandoli and Holm [5] have shown local existence and uniqueness 
for the 3D Euler Equation on the torus, whilst Crisan and Lang [9, 11, 10] 
demonstrated the same result for the Euler, Rotating Shallow Water and Great 
Lake Equations on the torus once more. Whilst this represents a strong start in the 
theoretical analysis (alongside works for SPDEs with general transport noise e.g. 
[2, 1]), the modelling literature continues to expand in both the deterministic fluid 
models (see for example Figure 2 of [8] and the analysis therein) and method of 
stochastic perturbation (for example we may soon look to introduce nonlinearity 
and time dependence in the (&;)). The significance of an abstract approach to the 
well-posedness question is clear, and whilst we discuss here only an application to 
SALT Navier-Stokes [16, 12] the hope is that other stochastic viscous fluid models 
can be similarly solved by simply checking the required assumptions. We state our 
equation in the form 


t t 
W, =Wo+ Í A(s, W,)ds + f G(s, W,)dW, (2) 
0 0 


for operators A and G to be elucidated in due course. The most notable contribution 
to the well-posedness theory for an abstract nonlinear SPDE is from [13]. Here the 
authors prove the existence of a unique maximal solution to their abstract equation 
and apply this to the three dimensional primitive equations with a Lipschitz type 
multiplicative noise. The class of equations which we are concerned with include a 
differential operator in the noise term, preventing us from applying this framework. 
Moreover the assumptions on their operator A are quite explicit in terms of the 
sum of the standard fluid nonlinear term and a linear operator, which we don’t 
restrict ourselves to. Overall our assumptions are much more general and allow 
for a straightforwards application to a wider class of SPDEs. Another relevant piece 
here is the work of Glatt-Holtz and Ziane [14] whom show the same existence and 
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uniqueness for the incompressible 3D Navier-Stokes with again a Lipschitz noise 
term. Though we cannot apply this method in the presence of our transport noise 
we look to adapt this argument to fit not just our Navier-Stokes equation but the 
wider class of stochastic viscous fluid models and SPDEs beyond. The impact of 
the boundary is fundamental to the equation and the approach of Glatt-Holtz and 
Ziane copes with the arising issues by working in the right function spaces; we 
recognised the importance of this in establishing an abstract framework which we 
hope to apply to such stochastic transport equations on the bounded domain as well. 

This short summary work contains three more sections: in the subsequent one 
we properly define our Stochastic Navier-Stokes equation through the operators 
involved, the relevant function spaces, the notions of solution and main results. 
Following this we concretely define our abstract formulation and notion of solution, 
giving the assumptions that we require and the main results for the abstract equation. 
These assumptions are then all that needs to be checked to conclude the relevant 
existence and uniqueness for the proposed SPDE. In the final section we discuss 
the key steps behind proving these results; in the spirit of this as a summary work 
announcing our results we do not give a complete proof, though all such arguments 
are to be found in [7]. We then address how our Navier-Stokes equation fits the 
context of the abstract formulation, though once more we do not give a thorough 
justification that the operators of our equation satisfy the required assumptions, with 
this precise treatment to come in [6]. 


2 SALT Navier-Stokes and Results 


As alluded to in this section we formally introduce Eq. (1) and state the main results. 


2.1 Preliminaries from Stochastic Analysis 


Throughout the paper we work with a fixed filtered probability space 

(2, F, (F), P) satisfying the usual conditions of completeness and right continuity. 
We take W to be a cylindrical Brownian Motion over some Hilbert Space U4 with 
orthonormal basis (e;). The choice of 4 and the subsequent basis play no role 
in the analysis. Recall ({15, Subsection 1.4]) that W admits the representation 
W, = pate wi as a limit in L? (2; W) whereby the (W') are a collec- 
tion of i.i.d. standard real valued Brownian Motions and SU’ is an enlargement 
of the Hilbert Space {{ such that the embedding J : U — WW’ is Hilbert- 
Schmidt and W is a J J*—cylindrical Brownian Motion over W. Given a process 
F:[0,T]/x2Q—7> Z 24; H) progressively measurable and such that F € 


L? (2 x [0, T]; Z? (U; H)), for any 0 < t < T we understand the stochastic 


integral 
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t 
if FsdWs 
0 


to be the infinite sum 
oo t 
> f renaw 
i=l 0 


taken in L?(2; 7). We can extend this notion to processes F which are such that 
F(w) € L? (to, T]; Z? (U; H)) for P — a.e. w via the traditional localisation 
procedure. In this case the stochastic integral is a local martingale in #. A 
complete, direct construction of this integral, a treatment of its properties and the 
fundamentals of stochastic calculus in infinite dimensions can be found in [15, 
Section 1]. 


2.2 SALT Navier-Stokes Equation 


We present Eq. (1) on the three dimensional torus T? (noting that all results hold 
on T°), and detail now the operators involved alongside the function spaces which 
define the equations. The operator £ is defined for sufficiently regular functions 
¢,w:T? > R? by 


3 
Lov => blayw 


j=l 


where ¢/ : T? — Ris the j" coordinate mapping of ¢ and 3 jW is defined by its k? 
coordinate mapping (ðjy)* = ayy. The operator B is defined as a linear operator 
on M (introduced in Sect. 2.1) by its action on the basis vectors B(e;, -) := B;(-) by 


Bi = Li, + Te; 
for £ as above and 
3 . . 
Tow =} w Vg. 
j=l 


A complete discussion of how B is then defined on 4 is given in [15, Subsection 
2.2]. We embed the divergence-free and zero-average conditions into the relevant 
function spaces and simply define our solutions as belonging to these spaces. To be 
explicit, by a divergence-free function we mean a ġ € W1? (T3; R?) such that 
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3 . 
X ajg =0 
j=1 
and by zero-average we ask for a y € L?(T?; R3) with the property 
wdrk=0 
T3 


for à the Lebesgue measure on T?. We introduce the space L2(T?; R3) as the 
subspace of L7(T?; R?) consisting of zero-average functions which are ‘weakly 
divergence-free’; see [18] Definition 2.1 for the precise construction. w)?(T?; R3) 
is then defined as the subspace of W!?(T?; R?) consisting of zero-average 
divergence-free functions, and w2(T?; R3) := W>? (T?; R?) N w} (T°; R3). 

As is standard in the treatment of the incompressible Navier-Stokes Equation we 
consider a projected version to eliminate the pressure term and facilitate us working 
in the above spaces. Note that pọ does not come with an evolution equation and 
is simply chosen to ensure the incompressibility condition. The idea is to solve 
the projected equation and then append a pressure to it, see [18]. To this end we 
introduce the standard Leray Projector P defined as the orthogonal projection in 
L?(T?; R?) onto L? (T?; R?). As we look to project equation (1) as discussed, we 
ought to address the Stratonovich integral. We look to convert this term into an Itô 
integral to enable our analysis, but the resulting converted and projected equation 
should not depend on the order in which the projection and conversion occur. To 
this end we assume that the (&;) are such that &; € wT; R3) N W?-(T?; R3) 
and satisfy the bound 


Y= léi lly. < 00. (3) 
i=l 


The significance of the bound (3) will be revisited, but for now we note that as each 
&; is divergence-free then each B; satisfies the property that P B; is equal to P B;P 
on W!-?(T?; R?) which ensures that the projection and conversion commute. Our 
new equation is then 


t t 
Usp — Uo +f PLus ds + f Ausds 
0 0 


1 oo t oo t ; 
a 2 PB?usds + >| PBiusdWi=0 (4) 
i=l i=l 


where A := —P A is known as the Stokes Operator. Details of the Itô-Stratonovich 
conversion can be found in [15, Subsection 2.3]. We shall use the Stokes operator 
to define inner products with which we equip our function spaces. Recall from 
[18] Theorem 2.24 for example that there exists a collection of functions (ax), 
ak € Ww) (T°; R?) N C™(T?; R?) such that the (ax) are eigenfunctions of A, are 
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an orthonormal basis in L2 (T?; R°) and an orthogonal basis in wT; R3) con- 
sidered as Hilbert Spaces with standard L? (T?; R*), W1? (T3; R3) inner products. 
The corresponding eigenvalues (àg) are strictly positive and approach infinity as 
k —> œ. Thus any ¢ € WŁ? (T°; R?) admits the representation 


$= 5 Peak 
k=1 


so for m € N we can define A”/* by 


oo 
Aml2 ob X ae bea 
k=1 


which is a well defined element of L? (T?; R3) on any ¢ such that 


CO 


Soar bE < 00. (5) 


k=1 


For ¢, y with the property (5) then the bilinear form 


lQ, W)m = (A" p, A™! 4) 


is well defined. For m = 1,2 this is an inner product on the spaces 
wi?¢ 3; R3), w22(T3; R?) respectively which is equivalent to the standard 
w!(T?:; R?), W2(T?; R?) inner product. Of course (-,-)3 is well defined on 
UX; span{ai,...,ax} and so we define W3-?(T?; R3) as the completion of 
UX: span{a;,...,ax} in this inner product. We consider wn? (T°; R?) as a 
Hilbert Space equipped with the (-, -}m inner product, and define our solution to the 
equation (4) relative to these spaces. 


2.3 Notions of Solution and Results 


We frame this definition for an Fo—measurable ug : 2 > w) (T°; R3). Here and 
throughout we use the notation 1 for the indicator function. 


Definition 1 A pair (u,t) where t is a P — a.s. positive stopping time and u 
is a process such that for P — a.e. œw, u.(w) € C ([0, T]; Ww? (T?; R3)) and 
u.(@)L<rjw) E€ LA ([0, T]; w2?(T?; R3)) for all T > O with u.1.<; progressively 
measurable in w22(T?; IR3), is said to be a local strong solution of the equation (2) 
if the identity 
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tAt tAt 
ur — UO +f PLy us ds + f Ausds 
0 0 


1 o0 tAT oo tAT , 
= D PB?usds + >| PBiusdWi =0 (6) 
i=l i=l 


holds P — a.s. in L2 (T?; R°) for all t > 0. 


We shall address why this definition makes sense in the abstract setting in Sect. 3.3, 
before then translating this abstract framework back to our Navier-Stokes Equation. 


Definition 2 A pair (u, ©) such that there exists a sequence of stopping times (0j) 
which are P — a.s. monotone increasing and convergent to ©, whereby (u.y9,, 0j) 
is a local strong solution of the equation (4) for each j, is said to be a maximal 
strong solution of the equation (4) if for any other pair (v, J”) with this property 
then © < T P — a.s. implies O = F P — a.s. 


Definition 3 A maximal strong solution (u, ©) of the equation (4) is said to be 
unique if for any other such solution (v, l), then © = I” P — a.s. and for all 
t € [0, O), 


P({w € R : uow) = v;(@)}) = 1. 


We can now state the main result of the paper. 


Theorem 1 For any given Fg— measurable ug : 2 > Ww) (T?; IR3), there exists 
a unique maximal strong solution (u, ©) of the equation (4). Moreover at P — a.e. 
w for which O (w) < œ, we have that 


O(w) 
sup llu; (œf + I lu, (w)]|3dr = 00. (7) 
re[0,O(@)) 0 


3 Abstract Framework and Results 


We now establish the abstract framework through which we arrive at Theorem 1. 
This involves giving two sets of assumptions before exploring the abstract method 
with the assumptions in place, and then in Sect. 4.2 discussing how (4) fits into this 
framework. These assumption sets pertain to two different notions of solution (both 
strong in the probabilistic sense but related to different spaces), the reason for which 
will be illustrated in Sect. 4. We give these as two distinct sets of assumptions in the 
event that an equation fits the first set of assumptions but not the second, such that 
we would still be able to conclude that some type of solution exists for the equation. 
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3.1 Assumption Set 1 


We work with a quartet of continuously embedded Hilbert Spaces 
Vo HoOUSxX 
and the operators 
A: [0, œ) x V > U, 
G : [0, œ) x V > Y7(U; H). 


We ask that there is a continuous bilinear form (-, -xx : X x H — R such that 
for ġ € U and y € H, 


(O,V)xxH = (Q, W)u- (8) 


Moreover the continuity and bilinearity ensures that there exists some constant c 
whereby for all such ¢, Y, 


I$, W)xxHl < clolixliy lz- (9) 


As we look to use a Galerkin Scheme to solve our equation, we introduce now a 
sequence of spaces (V„) contained in V given by V, := span {a1, ... , an} for (an) 
an orthogonal basis in U. Defining P, to be the orthogonal projection onto V, in X, 
we shall also assume that the restriction of P, to U is an orthogonal projection in U 
and that the sequence of these projections is uniformly bounded on H: that is, that 
there exists some constant c independent of n such that for all @ € H, 


Pub lla < clol. (10) 


We also require the existence of a real valued sequence (un) with un — oo, which 
is such that for any @ € U andy € A, 


1 
Id — Pr) llx < un (Plu: (1) 


1 
I -Pavllu < q, Welle (12) 


where J represents the identity operator in the corresponding spaces. These 
assumptions are of course supplemented by a series of assumptions on the operators. 
We shall use general notation c; to represent a function c. : [0, 00) — R bounded 
on [0, T] for any T > 0, evaluated at the time t. Moreover we define functions K, 
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K relative to some non-negative constants p, p 
to define the functions K : U —> R,K : U 
K : H x H —> Rby 


We use a generic notation 
—> R, K : H — R and 


K($) := 1 + lll? 
K($, Y) = 1+ lol? + Vlg, 
K() = K($) + lols, 
Rg. y) = K($, W) + lle, + Wl, 


Distinct use of the function K will depend on different constants but in no 
meaningful way in our applications, hence no explicit reference to them shall be 
made. In the case of K, when p, Gg = 2 then we shall denote the general K by K2. 
In this case no further assumptions are made on the p, q. That is, Kz has the general 
representation 


Kale, Y) = K(b, Y) + Molly + Wa (13) 


and similarly as a function of one variable. 

We state the assumptions for arbitrary elements ¢, Y € V, @” € V, andt € 
[0, oo), anda fixed x > 0. Understanding G as an operator G : [0, œ0)x V xU > H, 
we introduce the notation G;(-, -) := G(-, +, ei). 


Assumption 1 For any T > 0, A: [0,T] x V —> UandG: [0,T] x V > 
YL? (U; H) are measurable. 


Remark I Measurability here and throughout the paper is defined with respect to 
the Borel Sigma Algebra on the relevant Hilbert Spaces. 


Assumption 2 


IAG, DIE + YO IKE. Olly < eK @)[1 + lolly] (14) 


i=1 


IAG, 6) — At, Plx < & IK$, Y) + Ibllv + IWiivdilé — vla, 
(15) 


Yo 1Gi(t. p) — Gilt. Wilx < aK H, Wild — vlla (16) 


i=1 
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Assumption 3 


UP Al, G"), p a + > Pagi, PON < 


i=1 


Ro") [1+ lb" |- «lol, aD 


DPE goy SRG") [1+ 1106" ]- a8 
i=1 
Assumption 4 


UAE, 6) — Alt. Y) p- Wu + DOIG. o) — Gilt, WII 


i=l 


< a Kalo, Wd — Wile — Klo — WIZ, 


(19) 
Scie, $) — Gilt, W,- Wy < Kod, Wild — Wit (20) 
= 

Assumption 5 

21408). 8)u + IG. OIE aK) [1 + lolli]. Q1) 
= 
2 Gilt, 6). ph sak @)[1 + oit]. (22) 


3.2 Assumption Set 2 


These assumptions are only checked in addition to Assumption Set | and so take 
place in the same framework. We state the assumptions now for arbitrary elements 
¢,w €e H andt e [0,œ), and continue to use the c, K, K,« notation of 
Assumption Set 1. 


Assumption 6 For any T > 0, A: [0, T] x H —> X is measurable, and whenever 
® is a progressively measurable process in H we have that G(-, ®.) is progressively 
measurable in Z? (MU; U). 
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Assumption 7 


IAG, bk + 3 Gite, DNG < KH) [i+ lolz | (23) 
i=l 
IAG, 6) — At, Wilx < cr [K ($, Y) + lola + Illa ld — Vila 
(24) 
Assumption 8 
A(t, $) — Alt, Y), 6 — Wx + 5 Gite, o) — Gilt, Wlk < 
= 

cK, W) — Wilx. (25) 

Gite, $) — Gilt. ¥),.6-W)y < 

= 

1 Kx(g, Wid — Y liz (26) 


We in fact state Assumption 9 for @ € V and some «x > 0, making this a stronger 
assumption than 5. 


Assumption 9 With the stricter requirement that ġ € V then 


UAL, p), Bu +Y IGE, Plz < cK) — kloll, (27) 


i=l 


3 Gilt, 6), $y < cK ($). (28) 


3.3 Notions of Solution and Results 


Here we define the two different notions of solution, which we call V-valued 
solutions and H-valued solutions. The corresponding definitions of uniqueness and 
maximality are given in one for both notions of solution. We frame the definition 
of the V-valued solutions for an initial condition Wọ : Q — H which is an Fo- 
measurable mapping, and for the H-valued solutions a Wo : @ — U which is 
likewise Fo-measurable. 


Definition 4 A pair (¥, t) where t is a P — a.s. positive stopping time and W is 
a process such that for P — a.e. w, W.(w) € C ([0, T]; H) and Y. (w)l.<r(œw) € 
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L? ([0, T]; V) for all T > 0 with W 1.<, progressively measurable in V, is said to 
be a V-valued local strong solution of the equation (2) if the identity 


tAT tAT 
v, = Wo + I A(s, W,)ds + f G(s, YAW; (29) 
0 0 


holds P — a.s. in U for all t > 0. 


Remark 2 If (W, 7) is a V-valued local strong solution of the equation (2), then 
Ww. = Var. 


Remark 3 The progressive measurability condition on W.1.<,; may look a little 
suspect as Wo itself may only belong to H and not V making it impossible for 
W.1.<, to be even adapted in V. We are mildly abusing notation here; what we 
really ask is that there exists a process ® which is progressively measurable in V 
and such that Ø. = W.1.<, almost surely over the product space 92 x [0, 00) with 
product measure P x À for A the Lebesgue measure on [0, 00). 


Remark 4 If Assumption | and (14) hold, then the time integral is well defined in 
U and the stochastic integral is well defined as a local martingale in H. 


Definition 5 A pair (W, t) where t is a P — a.s. positive stopping time and W is 
a process such that for P — a.e. œw, W.(w) € C ([0, T]; U) and ¥.(w)l.<t(%) E 
iL ([0, T]; H) for all T > O with W.1.<, progressively measurable in H, is said to 
be an H-valued local strong solution of the equation (2) if the identity 


tAT tAT 
W, = Wot I Ae aes f G(s, YAW, (30) 
0 0 


holds P — a.s. in X for all t > 0. 
Remark 5 The analogy to Remarks 2, 3 hold for the H-valued solutions. 


Remark 6 If Assumption 6 and (23) hold, then the time integral is well defined in 
X and the stochastic integral is well defined as a local martingale in U. 


In the following we use V; H to mean V or H respectively. 


Definition 6 A pair (W, ©) such that there exists a sequence of stopping times (6;) 
which are P — a.s. monotone increasing and convergent to ©, whereby (W.y9,, 0j) 
is a (V; H)—valued local strong solution of the equation (2) for each j, is said to be 
a (V; H)—valued maximal strong solution of the equation (2) if for any other pair 
(®, I’) with this property then © < T P — a.s. implies O = F P — a.s. 
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Definition 7 A (V; H)-valued maximal strong solution (W, ©) of the equation (2) 
is said to be unique if for any other such solution (®, I”), then © = F P — a.s. and 
for allt € [0, ©), 


P (we 2 : Piw) = O;(@)}) = 1. 


Theorem 2 Suppose that Assumption Set I holds. Then for any given Fo- 
measurable Wo : Q — H, there exists a unique V -valued maximal strong solution 
(W, ©) of the equation (2). Moreover at P — a.e. w for which O (w) < œ, we have 
that 


2 om 2 
sup Yo) + I |¥-(@)|Ipdr = o. (31) 
re[0,O(@)) 0 


Theorem 3 Suppose that Assumption Set 1 and 2 hold. Then for any given Fo- 
measurable Wo : Q — U, there exists a unique H-valued maximal strong solution 
(W, ©) of the equation (2). Moreover at P — a.e. w for which O (w) < œ, we have 
that 


alow) 
sup [|¥-(w) I, + f | -(w) pdr = 0. (32) 
re[0,O(@)) 0 


4 Abstract Solution Method and Application 


In this final section we give the main steps of the proofs of Theorems 2 and 3, 
followed by a brief exposition of how our SALT Navier-Stokes Equation fits into 
this framework. 


4.1 Abstract Solution Method 


Proof (Theorem 2) We suppose that Assumption Set 1 holds and address the 
question first for an initial condition Wg which is such that for P — a.e. œ, 


Mo) lz < M' (33) 
for some constant M’. We work with this bounded initial condition in the first 


instance as we shall use local solutions up to first hitting times given in terms of 
the initial condition, so this boundedness translates to boundedness of the relevant 
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process up until these times. As directed in Sect. 3.1 we are to use a Galerkin 
Scheme, whereby we consider the equations 


t t 
v? = WG +f Pr A(s, Wi )ds +f PaG (s, Wr )dWs (34) 
0 0 


with notation P,G(-, -, ei) := PnGi(-, -). A local strong solution of this equation is 
defined as a pair (W”", t) where t is a P — a.s. positive stopping time and W” is 
an adapted process in V, such that for P — a.e. w, W"(w) € C ([0, T]; Vn) for all 
T > 0, and the identity 


INT tAT 
wr=Wot jl Pr A(s, Wi )ds + f PaG (s, Wi )dWs (35) 
0 0 


holds P — a.s. in V, for all t > 0. We can conclude that for any fixed t > 0 and 
M > 1, a local strong solution (W”, t” 1) of (34) exists for the stopping time M! 
defined by 


S 
rM = init [> 0: sup a f pwede = ae vt (36) 


re[0,s 


This conclusion is reached thanks to Assumption 2, through standard theory in the 
finite dimensional Hilbert Space V, though some care must be taken for the infinite 
dimensional Brownian Motion. Understanding that 


IESO < clVo@lly < cM (37) 
coming from (10) and (33), it is clear that 
IMj@lly < M (38) 


for some M clearly still independent of n and w. Thus we see the bound 


oo) 7 
sup Yo)? + | IW” (wo) |2,ds < M + Ñ (39) 
0 


re[0, t} (w)] 


holds true for every n and P — a.e. w. This boundedness plays a significant role 
in our analysis and demonstrates the importance of starting from this bounded 
initial condition in the first instance. The motivation for choosing these stopping 
times comes from the work of Glatt-Holtz and Ziane in the referenced paper [14]. 
The authors prove an abstract result which is the central theorem of the paper, 
which we simply restate in the Appendix as Theorem 4. In the original paper, the 
authors use the traditional Galerkin Scheme for Navier-Stokes (given by the basis 
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of eigenfunctions of the Stokes Operator) and apply this theorem directly with the 
spaces Hı := we (03; R3), H2 := Ww (T?; IR3). We have to take a slight detour 
from this method in the case of transport noise due to the condition (47). Translating 
this to our framework through Hı = H and H2 = U, the idea in showing this 
condition is to apply the It6 Formula in U to the difference process W” — W”. When 
we simplify down the term arising from the quadratic variation of the stochastic 
integral, we must control 


I — Palgi (s, Y™ N3, 


Me 


ji 


1 


which we would do via (12) and (10) to bound the above by 


[0,6] 


1 
— ||Gi(s, W")|I7,. 
D Gils, Y4 


; m 
i=] 


In order to send this to zero as m —> co we use some uniform boundedness of 
the term °°, ||Gi(s, wm) (3, which in the case of a Lipschitz operator as in the 
original paper is immediate from (39). Where G; is a differential operator we must 
obtain uniform boundedness of the solutions (W”) in a higher norm, hence the need 
for our space V (which in the context of our SALT Navier-Stokes, would then be 
wè? (T?; R3)). For this reason we must introduce another step to the proof, whereby 
we show that there exists constants C, Ç dependent on M, M’, t but independent of 
n such that for the local strong solution (¥”, ae t) of (34), 


M,t 
Tn 
sp Wr +E wwrgas<cje(iwh)+] 40 
relo, r2] 0 
and in particular 
2 sup |W lz, + if wrlpds < Č. (41) 
relo, 7] 0 


This result is proven by considering V, as a Hilbert Space with H inner product, 
applying the It6 Formula in this context and using Assumption 3. Equation (41) then 
follows from (40) due to (10) so we see the significance of starting from an initial 
condition bounded in H and not just U (or at least, square integrable in H). From 
Assumption 4, along with the requirement that each P, is an orthogonal projection 
in X and U and the conditions (8),(11),(12), we deduce that for any m < n and 
Am = min{ Um, užb 
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UP AM, P) — Pm Alt, Y), Q — Wu + Yo Pagi, 6) — PnGilt, Willa 
i=1 
< Ra, DI- Wilt — Zo- Wily + KO W) [1+ oly + lvl] 
Yi(PnGilt, Ø) — PmGilt, Y), o- Wy 
i=l 


< Kx, Wild — vli + KG. W [1+ VIF]. 


Along with (41) these bounds allow us to conclude that 


lim sup | i sup [wr — y” liz, 
nam relo, r ar] 


M,t 


M,t 
f AT 
0 


again via an application of the Itô Formula for V, considered as a Hilbert Space 
with U inner product, on the difference process W” — ẹW™. With similar ideas and 
the Assumption 5, we infer that 


wr — vias] =0 (42) 


lim sup P({ sup LAR 
5>0neN rel0,t AS] 


TAS 
+f Iveidr = M—1+ B12 |) =o. 
0 
(43) 
We then apply Theorem 4 for Hı = H, H2 = U and claim that the resulting pair 


(W, t% ') satisfies the additional properties that: 


— W1__m. is progressively measurable in V; 

— For P—a.e.w, Y (œw) € C ([0, T]; H) and V. (o)l Mi o) € L? ({0, T]; V) for 
all T > 0; 7 

— w”! — W holds in the sense that 


r% 
| sup |W"! —w,||7, +f |y” — vt — 0. (44) 
ref 0 


0,1%] 


Indeed the first two are true from using the uniform boundedness (41) and taking 
weakly convergent subsequences in the appropriate spaces, then using uniqueness 
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of limits in the weak topology and the embeddings V —> H «= U to identify 
this limit with W. The weak convergence preserves the measurability and so the 
progressive measurability of each W” (from the continuity and adaptedness in V,,) 
is what gives the result here. The final item is then a simple application of the 
dominated convergence theorem. To conclude that (W, t% F) is a V-valued local 
strong solution it only remains to show the identity (29), which is done by taking 
limits of the corresponding terms in (35) and applying (15), (16) alongside the 
already used assumptions on the (P;,). We take the limit in X and argue that the 
identity being satisfied in X is sufficient to conclude the satisfaction of the identity 
in U, given that all integrals can be constructed in U from the regularity of the 
solution. 

We have now shown the existence of a V—valued local strong solution but for 
the bounded initial condition (33). We then show a uniqueness result for such 
solutions, which is: suppose that (wl t1) and (w?, T2) are two V—valued local 
strong solutions of the equation (2) for a given initial condition Wo. Then for all 
s € [0, co), 


wi 2 
P (fo EQ: Vinia) = Ws (eyan(o)} ) =1. 


This is proven through applying Assumption 4 in the context of an It6 Formula in 
U of the difference process of any two solutions. With this uniqueness in place we 
then conclude the results of Theorem 2 but still for the bounded initial condition, 
via similar arguments to those used in [14]. To pass to a general initial condition we 
consider a sequence of such maximal strong solutions (wk 0%) corresponding to 
the bounded initial conditions (Wo1;<|wo\|7<k+1) and use the maximality on these 
pieces to show that the pair (W, ©) defined at each time t € [0, T] and w € 2 by 


o0 [o,@) 


Yw) = X Plow lukt O@):= X Ooko) ck! 
k=1 k=1 


is our desired solution for the initial condition Wo (where the limit for W is in reality 
just a finite sum). It is clear that for any w, there exists a k such that (¥ (œw), O(@)) = 
(wo), 0*(w)) so the property (31) follows from the same property in the case 
of the bounded initial condition. This rounds off our discussion for the proof of 
Theorem 2. 


In the case where Assumption Set 2 holds, we then look to use the V-valued local 
strong solutions to obtain an H-valued local strong solution but now just for a U- 
valued initial condition. At this juncture it is well worth addressing the question of 
why we consider these distinct types of solution; that is if we wanted an H-valued 
local strong solution then why not restate Assumption Set 1 for the spaces V as H, 
H as U and U as X? The reason lies in the application to our stochastic Navier- 
Stokes equation, which would then not satisfy the required assumption. This will be 
discussed more explicitly in Sect. 4.2. 
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Proof (Theorem 3) The idea now is to apply this existence result to the sequence 
of initial conditions (P,Wo), and apply the same Theorem 4 argument to the 
corresponding sequence of solutions. From here we now need to suppose that 
Assumption Set 2 holds in addition to Assumption Set 1. In the same manner we 
start again from a bounded Wo, this time such that 


lY), < M. (45) 


We could immediately apply Theorem 2 for each initial condition P„ Wo, though 
we want to apply Theorem 4 for the same spaces Hı = H and H2 = U. Recall 
that we could not do this immediately for a U-valued initial condition and the 
sequence of Galerkin solutions due to gaining a suitable control on the noise term 
arising from the difference of the projections. In the present scenario we consider 
solutions to the unprojected (2) and so we are not burdened with this difficulty. 
An application of Theorem 4 would rely on us being able to conclude that each 
maximal solution (W", ©”) corresponding to the initial condition P Wo exists up 
until the stopping time (36) (where the W” notation has now shifted to the above). 
This is not immediate from Theorem 2, though we can use similar maximality 
arguments to extend these solutions to z” at the cost of some regularity. Indeed 
for these extended solutions we have only the regularity of the H-valued solution 
but with the additional benefit that W;(@)1. <M) E V almost everywhere on 
the product space %2 x [0, 00). This facilitates the use of Assumption 4 in order 
to show the Cauchy property (42), but only via first using an It6 Formula with the 
bilinear form (-, -)xx#. We must make this step as the identity for these extended 
solutions is only satisfied in X hence we cannot use the U inner product. The 
stochastic integral though can be constructed in U following from Remark 5, and 
the regularity W;(w)1 <t! (o) E V allows us to call upon the property (8) so that we 
can apply Assumption 4. Without the uniform boundedness (41) for these solutions 
we need Assumption 9 instead of just 5 to deduce (43). The conclusion of the proof 
of Theorem 3 then follows identically to that of 2, now using Assumption 8 for the 
uniqueness part and (24) to show the convergence of the time integral term when 
justifying that the limiting pair (W, t% ') obtained from Theorem 4 is an H-valued 
local strong solution. 


4.2 SALT Navier-Stokes in the Abstract Framework 


We now briefly comment on the application of this abstract framework to Eq. (4) in 
order to conclude the paper. In the previous subsection we have already established 
the identification of the spaces 


V := W° (T°; R°), H := We? (TÌ; R°), U := Wy? (1°; R®), X = 12 (0°; R°) 
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at which point we address the question posed in that subsection as to why we need 
to make this effort with first the V—valued solutions before showing the existence 
for the H—valued ones. That is, why would Assumption Set 1 not hold if we were 
to shift the spaces from V to H, H to U and U to X (with some modifications of 
the reference to X in Assumption Set 1)? One clear answer is in the treatment of the 
nonlinear term for (17): for H = w2(T?; R?) we have the algebra property of the 
Sobolev Space which affords us a bound 


ILe” ll < cll" alo" lls 


using the equivalence of the ||- ||2 and the standard W?? one. In the W !:? norm we do 
not have the same luxury and so this nonlinear term cannot be bounded just in terms 
of the W £? and W®? norms as would be required. It is worth noting the significance 
of using the (-, -)2 inner product here, as in the same assumption this facilitates the 
“integration by parts’ property for the Stokes Operator in order to gain the additional 
control we require (i.e. the —x ||” I3 term). There is some additional care required 
then to control the noise terms in these inner products, but this is facilitated by using 
the same standard cancellation argument that 


(Leb, ìr =0 (46) 


for ġ € W!.2(T3; R°), as well as appreciating that the commutator [A, B;] is of 
second order and commuting through the B; with A until we reduce to a term of the 
form (46). The control (3) allows the &; to be effectively ignored in many of these 
computations, by just pulling them out with the supremum. We refer once more 
to [6] for the complete details. Of course it is Theorem 3 which is what translates 
into our main Theorem of the paper (1), though it is also worth noting that having 
showed Theorem 2 in this context then we can also say something about the retained 
regularity of our solutions coming from a more regular initial condition. To really 
make this point we’d have to say that the maximal times for the different notions of 
solution were in fact the same, and this is to be addressed in [6]. 


Appendix 


Here we state [14, Lemma 5.1]. 


Theorem 4 Let Hı C Hz be Hilbert Spaces with continuous embedding, and (W") 
be a sequence of processes such that for P — a.e. œw, W"(w) € C ([0, T]; H2) A 
L? (L0, T]; Hy) which is a Banach Space with norm 


1 


T 2 
IWllxcr a sup lW, + f vaar) . 


re[0,T] 
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For some fixed M > 1 and t > 0 define the stopping times 


Mt) = t ^ inf fs > 0: |W") > M+ EAO 


and suppose that 


and 


: 7 n m2 = 
„Em Sup DJLA y ETETE =0 (47) 
: n2 n2 — 
lim sup P (| IW" mg = M-1+ KAPAS) =0. 


>Y neN 


Then there exists a stopping time t% "a subsequence (Y™) and process W = 


y 
“A 


mı Such that: 
Too 


= r (fo < r% < ae) =1; 


_ ForP—a.e. w, Yo) €C (to, M+ o); Ha) nL (to, 1 (oN; Hi): 


— ForP — a.e. œw, W" (œw) > V(w) in 


(C (10, 38" (I: Ha) A L? (10, LI: Hi) I xet) 
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Abstract The mathematical models and numerical simulations reported here are 
motivated by satellite observations of horizontal gradients of sea surface tempera- 
ture and salinity that are closely coordinated with the slowly varying envelope of the 
rapidly oscillating waves. This coordination of gradients of fluid material properties 
with wave envelopes tends to occur when strong horizontal buoyancy gradients are 
present. The nonlinear models of this coordinated movement presented here may 
provide future opportunities for the optimal design of satellite imagery that could 
simultaneously capture the dynamics of both waves and currents directly. 

The model derived here appears in two levels of approximation: first for rapidly 
oscillating waves, and then for their slowly varying envelope (SVE) approximation 
obtained by using the WKB approach. The WKB wave-current-buoyancy inter- 
action model derived here for a free surface with significant horizontal buoyancy 
gradients indicates that the mechanism for the emergence of these correlations is 
the ponderomotive force of the slowly varying envelope of rapidly oscillating waves 
acting on the surface currents via the horizontal buoyancy gradient. In this model, 
the buoyancy gradient appears explicitly in the WKB wave momentum, which in 
turn generates density-weighted potential vorticity whenever the buoyancy gradient 
is not aligned with the wave-envelope gradient. 


Keywords Nonlinear water waves - Free surface fluid dynamics - Geometric 
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1 Introduction 


1.1 Submesoscale Sea Surface Dynamics 


Capabilities in sea surface observation have been improving rapidly during the past 
two decades [1]. In particular, new high-resolution satellite observation capabilities 
are revealing sea surface features seen for the first time at submesoscale spatial 
scales of 100m—10km and time scales of hours to weeks. Invariably, the new 
satellite imagery reveals a plethora of coupled dynamical surface phenomena, 
including currents, spiral filaments, flotsam patterns, jets and fronts, some of which 
are detected indirectly through gradients of sea surface temperature, salinity or 
colour, in addition to the imagery [5, 10, 13, 20, 26]. 

The new capabilities in sea surface observation are still developing. For example, 
the impending Surface Water Ocean Topography (SWOT) mission will map the 
ocean surface mesoscale sea surface height field, as well as a large fraction of the 
associated submesoscale field, including buoyancy fronts [17]. A sample of this type 
of submesoscale data taken from [5] is shown in Figs. 1 and 2. 

The coming new age of higher-resolution upper ocean observations will present 
a formidable array of challenges for the next generation in data management, 
computational simulation and mathematical modelling. This paper will offer a 
mathematical modelling framework that is flexible enough to admit uncertainty 


Fig. 1 Wave activity in the submesoscale ocean is dynamically complex, as illustrated in this 
figure showing the zoomed image of a submesoscale sea surface elevation, seen in Envisar 
MERIS glitter observations. This image shows the wave elevation tracking a cyclonic eddy 
visible in the sea surface glitter observations. The pixel resolution is 250m. This glitter image 
demonstrates the complex, highly-coordinated dynamical forms taken in wave-current interaction 
on the submesoscale sea surface. In particular, notice the instabilities developing in the eddy’s outer 
boundary. Image courtesy of B. Chapron 
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(a) Sea surface temperature near the Gulf Stream, on April 1st 2010, from the 
Envisat AATSR measurements. 


(b) Sea surface glitter contrasts near the Gulf Stream, on April 1st 2010, from 
the Envisar MERIS observations. 


Fig. 2 Comparison of the two images above demonstrates the emergent coherence between sea 
surface temperature and the glitter patterns visible from satellite imagery. The thermal fronts visible 
are dynamic, and sea surface roughness is most obvious along the strongest fronts. Discussions 
of the interpretation of sun glitter measurements are given in [5, 20, 26]. Images courtesy of B. 
Chapron 


quantification through stochastic modelling and analysis, applied in concert with 
high-resolution observations, computational simulations, and stochastic data assim- 
ilation for large data sets. This framework involves decomposing the surface motion 
into a two-dimensional horizontal flow map representing transport by the current 
acting on a one-dimensional vertical flow map representing wave-like motion of 
the elevation. This composition-of-maps modelling framework is described and 
applied to model sea-surface dynamics in two deterministic examples in Sect. 2 of 
the present paper. 
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Emergent Coherence (EC) Combining high-resolution thermal data (buoyancy) 
with glitter data for the wave elevation as in Fig. 2 has recently revealed yet another 
interesting feature of submesoscale dynamics. Namely, the observed submesoscale 
data show extremely high correlations of wave, current and thermal properties 
[5]. This emergent spatial-temporal coherence of dynamic and thermal properties 
presents a significant challenge for dynamical submesoscale modelling. Accepting 
this challenge, the aim of this paper is to derive a mathematical model of nonlinear 
sea surface dynamics whose solutions also demonstrate the emergent coherence 
observed in combining different types of submesoscale data. This paper derives new 
two-dimensional equations that show the emergent coherence (EC) seen in the sea 
surface features appearing in Fig. 2. The EC behaviour produced by the equations 
derived here are demonstrated in Fig.3 which shows a snapshot of the coherence 
of buoyancy and wave amplitude distributions in the dynamics of divergence-free 
two-dimensional flow acting on free surface vertical elevation wave features moving 
under gravity. In the model equations, the horizontal buoyancy gradients mediate the 
interactions between the vertical elevation waves and the horizontal currents. The 
equations of motion represent the current as a time-dependent, area-preserving map 
of the horizontal plane into itself and the waves as the composition of the horizontal 
flow map with a time-dependent vertical elevation map. Thus, the model involves a 
dynamical composition of maps (CoM). 


2 Submesoscale Thermal Wave-Current Dynamics on a Free 
Surface 


2.1 Surface Waves as Symmetry-Breaking Features of Local 
Force Imbalances 


Waves are propagating symmetry-breaking features that signify the response to 
a local imbalance of forces. Thus, from the viewpoint of satellite oceanography, 
observations of waves—defined as propagating sea surface elevation features— 
signify processes at the surface or below the surface whose presence introduces 
forces that locally break the symmetry of the surface. The sea surface would 
otherwise follow the stable global gravitational balance of the geoid, which we 
regard here as being spherical. Thus, waves arise from a spatially local imbalance 
of forces in the neighbourhood of a stable equilibrium. The propagating feature of 
relevance here is the wave elevation, measured as the local departure of the surface 
level in the direction normal to its equilibrium mean level. The symmetry broken 
here is the invariance of the sea surface under spatial translations tangent to the 
equilibrium surface level, also known as the local horizontal direction. Hence, from 
the viewpoint of satellite oceanography, sea surface waves are observed as local 
vertical displacements of the otherwise horizontal motion of the ocean currents on 
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Fig. 3 This is a 5127 snapshot of the CoM equations in the SVE approximation in the potential 
vorticity form in (45). The four panels display the following distributions, modified potential 
vorticity Q-PV in (43) (top left), buoyancy (top right), square of the wave amplitude (bottom left) 
and wave phase (bottom right) in the numerical simulation of the dynamics of divergence-free flow 
on a free surface moving under gravity. The simulation began with a spin-up period with zero wave 
amplitude. After the spin-up period, as explained in Sect. 3, a checker-board pattern of finite wave 
amplitude with zero phase was introduced and the simulation was resumed. The ‘mixing’ of these 
wave patterns eventually brought them into coherence with the spatial distributions of thermal 
properties and potential vorticity. These features show an emergent coherence in patterns similar 
to those seen in the corresponding high-resolution satellite data in Fig. 2 


the sea surface. From the mathematical modelling viewpoint, sea surface waves 
are local vertical oscillations of the horizontal surface that are carried along by the 
horizontal current flow, envisaged as a smooth invertible time-dependent map of the 
horizontal surface into itself. This is the composition of maps (CoM) modelling 
approach for describing the dynamics of horizontal fluid flows (currents) acting 
on oscillating vertical elevations (waves). Since the surface current velocity, its 
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advected material properties and the wave elevation are all that can be observed in 
satellite oceanography, the task in three-dimensional ocean modelling for satellite 
oceanography devolves into determining the dynamical surface features that are 
produced by the three-dimensional flow processes below the surface arising from 
e.g., bathymetry, stratification, rotation, Langmuir circulation, and thermal effects 
such as frontogenesis. The dynamics of the surface signatures of these three- 
dimensional flow processes, as well as the effects of air-sea interactions on the 
surface, need to be interpreted in order to understand what satellite oceanography 
observes. 


2.2 A Tale of Two Maps: Currents and Waves 


Story Line Waves on the surface of the ocean are modelled here as a composition 
of two smooth invertible maps describing the temporal evolution and advection 
of two degrees of dynamical freedom interacting at widely separated space-time 
scales. In this composition of maps (CoM) approach, the waves are regarded as 
local vertical disturbances that rapidly oscillate as they are swept along by the broad, 
slowly changing horizontal currents. Thus, the slow current motion is a Lagrangian 
coordinate for the rapid wave oscillations. This wide separation in space-time 
scales invokes the classical WKB description. The standard WKB approach seeks a 
rapidly oscillating wave packet solution whose phase-averaged amplitude possesses 
a slowly varying envelope (SVE) spatially. The WKB method is often applied via 
a variational principle because in a variational setting the phase average naturally 
leads to an adiabatic invariant known as the wave action density, cf. for example, [3] 
for a review of the WKB or SVE method in fluid dynamics. Here we will follow the 
variational approach of [4, 11] guided by the classical work of [22, 24, 25]. 


Submesoscale Sea-Surface Motion: Composition of Two Time-Dependent 
Maps The position and velocity of fluid parcels in motion under gravity on a 
2D free surface embedded in R? have both horizontal and vertical components. The 
corresponding flow maps are denoted as the map ¢; : R? —> R? for the horizontal 
current flow, and as the composite map ¢;@; for the vertical elevation of the waves 
as a function of time and position in R?. The flow lines of these two components of 
the flow map of a free surface can be written as 


ri = ro and zt = ġir (Qrro) =: (rt), 


where r; = (xz, yt) € R? is the horizontal position along the flow at time ¢ and 
cı (r+) is the vertical elevation at horizontal position r; at time f, starting at position 
ro at time t = 0. Thus, one may say that the initial position of the flow line, ro, is 
a Lagrangian coordinate for the horizontal motion, and the horizontal motion is a 
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Lagrangian coordinate for the vertical motion. That is, the “footpoint’ at time t of 
the vertical component of the flow map ¢; is located in the horizontal plane along 
a curve ¢;ro parameterised by time t. Likewise, one can simply say that the wave 
dynamics is advected, or swept along, by the current dynamics. 

Hence, the corresponding horizontal and vertical components of velocity along a 
stream line r; in the horizontal plane are defined by, 


d d Zz pe a dh 

T = g Oro =V; (pro) =: V; (r), so V, = Pa; ' and 
dz L.A = d =4 y a 

a wrt) = g (€ $r) = O61) + Vré lrt) -Vi (re). 


That is, in the dynamics of free surface flow, the vertical velocity w(r, t) at a given 
Eulerian point r and time ż is related to the wave elevation ¢(r, t) and horizontal 
velocity V(r, t) at that point by 


w(r,t)=0,¢(r,t) +0(r, t)- V elr, t). 


In terms of these fluid variables, one could propose a Hamilton’s principle for wave- 
current interaction of a free surface by following [8] for the variational modelling 
framework and applying [24, 7] for the potential energy to find! 


b 
o=3s=3 | £0, ¢, D, p) dt 
b 2 
lfa, 2 a2 È 2 
=ef MEG +07(¢ + Vrt 9) ) sa) Pp p(D — 1) dr dt. 
(1) 
To interpret the variational principle proposed in (1) we rewrite its Lagrangian 


as a sum of an Eulerian spatial integral and an integral over material mass elements 
dro = Dp d?r which follow the paths of the horizontal fluid motion, r (ro, t) = 


Qtro, 


b f Dp b g2. c? 
0=8S=ô8 LM? — p(D —1)drdt aff =p ro dt. 
| [ta p( )d°rdt+ j A JFT" 
(2) 


1 In [8] the potential energy was linear in ¢. This linearity neglected the restoring force due to 
vertical pressure gradient via Archimedes’ principle. Adopting the potential energy quadratic in ¢ 
regains this restoring force. 
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Variations of the first summand in (2) at fixed spatial position (7) yield the Euler 
fluid equations for 2D divergence free flow with advected buoyancy, p(r,t) = 


Piro) = polro), 
enh das a 1 . a 
0+ MV- V, wW =— -Vrp with V,;-v=0. (3) 
p 


Variations of the second summand in (2) taken at fixed mass element (rọ) yield 
equations for vertical harmonic oscillations of the elevation of each material mass 
element 


24°C) Elrot) 


ae 
ro, t) = = 
a C 9 ) 7 dt? ro Fr2 


(4) 


The wave-elevation equation in (4) is unrealistic, though, because it implies that 
fluid mass elements with different labels (ro) would be oscillating in phase and all 
with the same frequency, as they follow the flow of the Euler fluid equations (3) for 
2D divergence free flow with advected buoyancy. This unrealistic synchronisation 
and resonance can be removed by including the inertia of each mass element. This 
can done by including the initial buoyancy of each mass element, as 


dt) pref Elot) 
dt? (ro po(ro) Fr? ` 


o°F(r9,t) = 0? (5) 
At this point in our reasoning, we have not yet considered the differences in space 
and time scales between the fluid flow and the wave activity. In what follows, we 
will use the simple composition-of-maps idea explained here along with estimates of 
relative space and time scales to investigate the applicability of this class of models. 
To improve the applicability of the model comprising (3) and (5) for describing the 
effects of currents on waves, we will derive a related model in the slowly varying 
envelope (SVE) approximation. The SVE approximation allows considerations of 
current and wave dynamics at the same space and time scales. 

The comparisons of the simulated solutions of these CoM models with the 
observations in Figs. 1, 2, 3, and 4 above indicate that these models can indeed 
produce results that match some aspects of observed features. However, these 
models are not derived from three dimensional fluid equations. Instead, they are 
derived from the simple solution Ansatz in Hamilton’s principle that the vertical 
elevation of the sea surface wave activity is carried by divergence-free horizontal 
fluid motion. The latter assumption is a weakness of the current approach, because 
it precludes effects of vertical up-welling and down-welling, which are observed 
to occur along with convergence and divergence of currents [10]. The equations 
derived here are also not associated with classical surface wave equations such as the 
nonlinear Schroedinger (NLS) equation, or other celebrated surface wave equations. 
This departure from the classical water wave literature may be regarded as another 
weakness of the current approach. 
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Fig. 4 These 512? snapshots of the CoM simulation in the vorticity form (25) shows the elevation 
¢ in the left panel and the density-weighted vertical velocity Ù on the right. The snapshots are taken 
at the same time and with the same fluid spin-up initial conditions as the snapshots of the simulation 
of the SVE approximate equations presented in Fig. 3. Overlaying the two figures demonstrates that 
the resolved features in the ¢ distribution in this figure of CoM results are bounded by the SVE 
wave envelope distribution |a|* in Fig. 3 


Estimating Parameters o° and Fr? for Satellite Observations The Lagrangian 
L, č, D, p) in (1) represents the dimension-free difference of the kinetic and 
potential energies, augmented by the incompressibility constraint imposed by the 
Lagrange multiplier p. Two dimension-free parameters (o? and Fr”) appear in 
this Hamilton’s principle. The coefficient o? = ([H] /LL])? in formula (1) is the 
square of the vertical-to-horizontal aspect ratio. Typically, for satellite observations 
of submesoscale dynamics one finds 


[H] ~ 3x 10-*-3x10)km and [L]*(107'-10)km, so o7%1073-10° «1 


for the squared aspect ratio o? « 1 of the height of the waves [H] relative to 
the breadth [L] of the two-dimensional domain. The squared ‘Froude number’ Fr? 
in this regime is estimated by the square of the ratio of horizontal and vertical 
frequency scales at the sea surface, 


2 
Fr? := (n) ~1—10*. (6) 


Here, the horizontal velocity on the sea surface is taken as [V] = (0.1 — 1) m/s, 
[H] = (0.3 — 3)m. According to [9], the Brunt-Väisälä buoyancy frequency in 
the sea surface wave regime is given by N ~ (107? — 10~*)/s. The ratio of 
horizontal and vertical frequency scales at the sea surface in (6) is selected for 
use later in applying the slowly varying envelope (SVE) wave approximation in 
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Sect. 2.4. Hence, we estimate that the squared product of the “Froude number’ and 
aspect ratio for satellite observations of the sea surface can reasonably be estimated 


over the range 
v] \2 
Oo Fr? := (an) x 107? — 10. (7) 


Modelling the Dynamic Effects of Surface Density Variations As mentioned 
earlier, the observed oscillations of sea surface waves are by no means simultaneous 
across the whole domain, although the observations show that they are indeed 
coordinated spatially with the buoyancy of the fluid. To correct this solution 
behaviour, the kinetic energy and potential energy need to be de-synchronised from 
the buoyancy. 

The dynamic dependence of the wave kinetic energy on the density is physically 
required. However, to de-synchronise the wave oscillations we can introduce a 
constant reference density pref into the wave potential energy, by writing 


g? s Pref g? Pref 
Fr2 p Fr? p 


of order O(1). (8) 


The quantity pref is a constant reference density, and the density ratio ((;ef/p) = 
O(1). 

The density dependence imposed here is important in the dynamics that follows 
from Hamilton’s principle. Substituting the relations in (8) into Hamilton’s principle 
in Eq. (1) leads to the following dimension-free action integral, 


b 
o=3s=5 | L, 6, D, p)dt 


Pre g? 
=s f fG (m? +0°( (dt + Vrt -3 3)’) rt) Do p(D — 1) d*rdt. 
(9) 


The advected quantities D(r, t)d*r and p(r,t) evolve via push-forward by the 
horizontal flow map, ¢;. For example, Ddr, = Pi, (Dod?ro) and p; = $1%Pref 
denote, respectively, evolution of the determinant of the Lagrange to Euler map and 
of the local scalar value of the mass density. Conservation of mass is then expressed 
as the push-forward relation, D; pdr; = Pr (Do Pref dro). The pressure p in (9) 
acts as a Lagrange multiplier to enforce conservation of area, so that D, = 1 = 
$t Do, and the horizontal flow is incompressible, which implies that the horizontal 
velocity is divergence-free, i.e., div,0(r,t) = 0. Taking variations of the action 
integral (9) yields the following set of equations, 
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ôL PS 
ov: == Dp(0- dr + ode) @d*r := DpV -dr®@d’r, 


wih D= &t +T. V,t, 


ôt: dlo? Dpw) + div, (0? Do0 — D seref =0, 
r 
(10) 
ôl p A2 Pref t? Baie - «es 
ôD: = E — = — 
ip o a ea Pee oe, 
ôl D ~ ~ ~ Pret? 
oe aap 202) =: ie z sE 
Oe gaa Peu =De; RPR A 


ôp: D-1=0 = div,v=0. 
From their definitions as advected quantities, one also knows that D and p satisfy 


(a; + L (D dr) = 0 = 3 D + div, (DÒ =0 with D=1, 
@ + LDP = 0 => 3p +T- Vrp =0, 


(11) 


where Lẹ denotes the Lie derivative operation along the horizontal velocity vector 
field, D, which provides coordinate-free brevity in the notation. 


Theorem 1 (Kelvin-Noether Circulation Theorem) Use of the Euler-Poincaré 
(EP) theorem yields the following Kelvin circulation theorem 


d 


1 
baa (@-ar+0°bdz) =— -dĵ. (12) 
dt Je@) cB) P 


Proof The Euler-Poincaré (EP) theorem in this case yields 


Gs = one “ep ey sy (13) 
Pv ee BD 8 ee a 


Here the diamond (¢ ) operator is defined by 


ôl „Jêl 
(= Si x). = (—, —£xa), (14) 
In addition, X € X is a (smooth) vector field defined on R? anda € V, a vector space 
of advected quantities, which are here the scalar function, p, and the areal density 
D d?r. Using the advection relations for D and p in (11) and the corresponding 
variational derivatives in (10) simplifies the EP equation in (14) to 


1 2) 1_ de 1 dé 


a, + Le = 
Gear Ge p 8D Dpdp 


Vrp. 


Equation (10) then yields (3; + LÐ (T- dr + od) =-p 'dp+do. 
(15) 
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Inserting the last relation into the following standard relation for the time derivative 
of a loop integral then completes the proof of Eq. (12) appearing in the statement of 
the theorem, 


d 


= T- dr +0’ dt -$ 
dt c) ( ) 


c0) 


(0; + LDT- dr + 0° @d£) = $ —p 'dp+doa. 
c0) 
(16) 


Using the advection relations for D and p in (11) again and combining with the 
variational relations with respect to ¢ in (10) simplifies the Ù and ¢ equations, as 
follows. 


Pref 


( + LDE = (a +T: VAS = Ww. 


After deriving these equations, one may finally evaluate the constraint D = 1 
imposed by the variation in pressure p to obtain further simplifications. o 


Corollary 2 (Kelvin-Noether circulation Theorem for the Current) The Kelvin 
circulation theorem for the current alone is given by, 


d pa 1 D]? 
ao D- dr = dp — d—. (18) 
dt Je@) cb) P 2 


Proof Equation (18) follows by shifting the wd¢ term in Eq. (38) to the right-hand 
side, as 


d ae a A 
T îar=- È -dP + 0° (ð; + LD (0 dt) —da 
c® P 


Tog pa PE Pere S 
=- -dP +0°((@ +0: V,)@)dg + 0° DdD — da 
c0) 


ls Ag ty oes 
= f dp aii tdt + 0° tdi — d& 
cb) P Fr-p 


1 2 
_ $ ia |v] 
c(v) P 2 


1 (a) 
= dp— d ; 
c P 2 


(19) 
o 


Remark 1 (Separation of Wave and Current Circulation) The decoupling of the 
Kelvin-Noether circulation theorem into its wave and current components, leading 
to the reduction of the current flow to the Euler result in Eq. (18), was also observed 
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in [8]. This behaviour is consistent with the Charney-Drazin ‘non-acceleration’ 
theorem [6, 23]. Namely, in certain circumstances, wave activity does not create 
circulation in the mean current. A modification that allows exchange of circulation 
between wave (vertical) and current (horizontal) components of the flow was 
proposed in [8]. The instabilities observed around the edges of eddies in the satellite 
imagery shown in Fig. | suggests that a coupling of this sort may exist at high wave 
number. 


Remark 2 Itis clear from Eqs. (38)—(18) that generation of circulation of the current 
by the dynamics in Eq. (15) requires non-zero V;p x V, p. No current circulation is 
generated by wave variables in the case of constant buoyancy. 


2.3. Thermal Potential Vorticity (TPV) Dynamics on a Free 
Surface 


The momentum map arising from the variations in (10) is given by 


pene tpode (20) 

——=pv-dr+opwde. 

D. i 
As expected from the well-known non-acceleration theorem [6, 23], the dynamics 
of the Euler-Poincaré equations separate (15) gives the dynamics of the fluid and 
wave components of the momentum one-form (20) 


(a + La (e: dr)) =-dp+ Falta’), 


21 
_ Pref e1) 


Fr? 


(3, + Lo (o° pdt) = tdt +o°pudw. 


The mass-weighted thermal potential vorticity (TPV) also separates into fluid and 
wave components Q = Or + Qw with following definitions 


Odr= a(o: dr + oac) 
= dp ^ (®- dr + 0° @dt) + p(2- curld + 07 J (0, ‘)) d&r 
= (diviovy) + oJ (pi, ‘)) dr when D=Vty for D=1, 


with Op:=divipVW), Ow=J (oñ, c) : 
(22) 
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where buoyancy weighted vertical velocity is defined as Ù := pw. The dynamics of 
Qr d’r and Qw d?r can be computed from (21) as 


(ð: + LOF dr) = idon = 5 J(p, VYF) dr, 
oa? (23) 
(3, + La)(Ow d?r) = 0? sip nd@?) = 54(0, 74 - Jar. 


From the two relations in (23), one sees that the buoyancy gradient V p couples the 
PV dynamics of the waves (Qw) and currents (Q pF), each to their corresponding 
kinetic energy. In the case of constant buoyancy, dp = 0 in (23); so, the PVs of the 
waves and currents would be separately advected. 

The operator div(pV) is invertible, so long as p is a differentiable positive 
function, which can be ensured by requiring that this condition holds initially. 
Consequently, the stream function yw is related to the other fluid variables by 


Y := (divo V)! QF. (24) 


The potential vorticity dynamics can then be written in coordinate form as 


Or + ICs, Or) = I(p, Vy), 
o w 
Ow + IW, Ow) = (o, GT) 
with Qpr :=div(oVy) and Qw :=J(0°®, ¢), (25) 
ðp + JY, p)=0, 
dt + JY, t) = w= w/p, 


ayo?) + IC, 0788) = E, 

Fr? 

Theorem 3 The Legendre transform yields the Hamiltonian formulation of our 
system of wave-current equations (25), which with © = pw may be written in the 
untangled block-diagonal Poisson form as 


Q J(Q,-) J(p, +) 0 0 ôh/8Q = y 

p | | spy 0 0 , ôh/ôp = © 
alon 0 0 0- 5h/5(o7W) = B/p + JE, Y) 

t 0 0 1 i ôh/ôt = -J (0°, y) + eB 


(26) 
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The energy Hamiltonian h(Q, p, Ù, €) associated with this system is given by 
a 1 n ; 2 z 
h(Q, p, Ù, t) = f(e - J(0°ŭ, ¢))@ive (o = J (0°. ¢)) 


202 2 
o~w Pref © 2 
ar. 
ee a, arf)? ’ 


(27) 


Theorem 4 (Casimir Functions) The Casimir functions, conserved by the relation 
{Co y, h} = 0 with any Hamiltonian h(M, D) for the block-diagonal Lie-Poisson 
bracket in Eq. (26) are given by 


sai J (p) + QW(p) d?r. (28) 


Proof The Casimirs Co,w for the direct sum of the Lie-Poisson brackets for Q and 
p and canonical Poisson brackets for Ù and ¢ follows by direct verification that the 
Co,w are conserved for any differentiable functions, (®, Y). o 


2.4 CoM Equations in the Slowly Varying Envelope (SVE) 
Approximation 


The SVE Solutions Apply to Satellite Observations of Sea Surface Waves 
From the viewpoint of satellite observations, the vertical motion on the sea surface 
typically oscillates much more quickly than the rate of change of features in the 
horizontal motion of the ocean surface currents. In this situation, the standard WKB 
approximation introduces a solution Ansatz for the slowly varying envelope (SVE) 
of the rapidly oscillating vertical wave elevation in the standard form [2, 11], 


iol, t) l 
cir.) =R(ar, nex (22) with ¢ <1. (29) 


The SVE solution Ansatz (29) comprises the product of a slowly varying complex 
amplitude a(r, t) € C multiplied by a rapidly oscillating phase 6 (r, t)/e € R with 
€ < | in which the phase factor 6(r, t) may also vary slowly as a function of the 
space and time variables, (r, t). 


Following [11], let us substitute the SVE solution Ansatz (29) into Hamilton’s 
principle in (9) and find the condition on the parameter € < | that will allow higher 
order wave terms to be neglected. For this, one computes 
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b 
0 = ôSsvE =s f lsve(, D, p; a, 0) dt 
a 
b 2 2 
1. dt\2 : 
=s f | Do0? — p(D- 1) + = o( ( =) Pref s 5) Prat 
a JD2 2 dt p 2o-Fr 
b 1 
=è f f Zom- pw) 
a D2 
2 2 2 2 2 
ue Dp ee ee (a2) la| (2) Pref € ) Brdt 
8 dt edt dt e2 \\dt p o2Fr2 


b 
— 
zaf I 5 Doll? — pD- D 
a D 


o?|a|? a 2 Pref e 2 o? 
Haa Do( (80 +3: V,8) o -oL rdt+ o( g ) ; 
(30) 


The leading order wave term O(€~*) with € < 1 in Hamilton’s principle will 
dominate the solution and the remaining wave terms in the second line of Eq. (31) 
may be neglected, when? 


2 


e&xl, = =0(1), and 0? Fr? <1. G1) 
o-Fr 
According to the estimates in (7) there is a range of physical parameters relevant to 
satellite observations in which the SVE approximation applies, for o° Fr? < 1. 

To continue the investigation of the SVE description of wave-current interactions 
on the sea surface, we take variations of the action integral (31) to find the following 


set of equations, 


8e dé ' o?|al? 
D: = Dpe(t-dr+Nd—) odr with N= ' 
Og a eee aaea 
ôL o? dON2 Pref o? 
ôlaļ? : = D L)=0 a o(S 
lal: Sja = arr o( (3) p ) (5) 
do JPPref 
ro o+d-k=+ pie with œ(r,t)=—30 and k(r,t)=V,6, 
p 
$0: E om aA+dv(AD=0, with A= Don @” and N aal 
ee + div(Av) = 0, = = =a 
50 i at 4e? 
ôl P an2 
ôD: — => = 
SD 5 Pl Ps 
ôl Dan 
spi == 
ôp 2 
ép: D-1=0 div, =0, Hence, 3 A+T. V, A=0 ala? +T- V,a? =0. 
(32) 


2 The ratio €? / (o? Fr*) = O(1) is required for the rate of change of the phase parameter 0(r, t) 
of the SVE wave solution Ansatz (29) to match the time scale of the density p(r, t) in Eq. (31). 
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In the second line of (32) we see that stationarity of the action integral with 
respect to variations in |a|? acts as a Lagrange multiplier to impose a constraint 
which relates the dynamics of the wave phase 0 to the buoyancy. This constraint 
relation involves the Doppler-shifted frequency of the waves, as shown in the third 
line of (32). In combination with conservation of the wave action density and the 
divergence free condition on the fluid flow velocity D, this constraint relation implies 
in the last line of (32) that the wave magnitude |a|? is advected by the fluid flow. 
Because of the oscillatory nature of the solution Ansatz (29), the sign of the wave 
phase in d0/dt = 0,0 +7- V,0 in the second line above is immaterial. Hence, 
hereafter, we will choose the positive root for d0/dt = ./Ppref/p. 

From the conservation of wave action density A in (32) and the definitions of the 
advected fluid variables, one finds that |a|*, D and p satisfy the following advection 
relations 


(a; + L)(D dr) = 0 = 3, D + div, (DA =0 with D=1, 
(+ LDP =0 = 4p +T: Vrp =0, (33) 
(4 + Lela? = 0 = & lal? +P- Vrlal? =0, 


where Lẹ denotes the Lie derivative operation along the horizontal velocity vector 
field, 0. The Lie derivative notation Lz provides coordinate-free brevity in proving 
the following Kelvin circulation theorem for thermal wave-current theory. 


Theorem 5 (Kelvin-Noether Circulation Theorem) The variational equations in 
(32) imply the following Kelvin circulation theorem 


d do 1 
co T- dr +Nd— --f dp. (34) 
dt DAN a ried 


Proof The Euler-Poincaré (EP) theorem [16] in this case yields 


Gps =) ope openana N (35) 
Ma Se BD i a eg 


Here, the diamond (¢ ) operator is defined for a fluid advected quantity f by 


” £ 36 
(5 pre k (s) 
In (36), X € X(R?) is a (smooth) vector field defined on R? and f € V is a vector 
space of advected quantities. These advected quantities are the scalar function, p, 
and the areal density, D dr. 

Upon using the advection relations for D and p in (33) and the corresponding 
variational derivatives in (32), the EP equation in (35) simplifies to 


spot x), = 
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1 dé 1 ôL 1 ô£ 
j= Vrp. 


3 + Le V 
(+ o p 8D Dpôp 


= dé =i laz 
Equation (32) then yields (0; + Lay(o -dr + Na—) =—p dpt+ d(5 101 ) : 
(37) 
Inserting the last relation into the following standard relation for the time derivative 


of a loop integral then completes the proof of Eq. (34) appearing in the statement of 
the theorem, 


TA (dr + va) =f e + £5)(9-ar + Na) -fo sarsar: 
(38) 


Note, however, that Eqs. (32) imply the following combination of advected quanti- 


ties, 
(0, + LD Mae = cae (0, + Lz) {| [ad Pref =0 (39) 
~ — A a => A 
: z dt 4Fr2`' k p 


Consequently, the wave-momentum 1-form Md (#) is advected by the fluid flow 
and the Kelvin circulation theorem in Eq. (38) reduces to the standard circulation 
theorem for the 2D Euler fluid equations. o 


Remark 3 (Separation of Wave and Current Motion in the SVE Approximation) The 
decoupling of the Kelvin-Noether circulation theorem into its wave and current 
components for the SVE approximation is inherited from the un-approximated 
model. When modifications to the un-approximated model which removes this 
property are added, one would expect the new SVE approximation to lose the non- 
acceleration result. 


Remark 4 Equation (39) implies advection of the 1-form |a|?dp, which in turn 
implies advection of the Jacobian J(\a|*, po). Since the fluid flow is area preserving, 
divv = 0, the following 2-form will also be advected, 


(3, +P- V,)(dlal? A do) =0. (40) 


Thus, the divergence-free flow of T preserves the area element dla|* ^ dp. This 
means that if the gradients V|a|? and Vp are not aligned initially, then they will 
remain so. It also means that equilibrium solutions of (40) will be symplectic 
manifolds [14]. 
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After deriving these equations, one may finally evaluate the constraint D = 1 
imposed by the variation in pressure p to obtain further simplifications. 


2.5 Thermal Potential Vorticity Dynamics with SVE on a Free 
Surface 


The momentum map arising from the variations of the action in (32) is given by 


1 dé dé : 2N7 lal? dé 
= P(e: dr +Nd—) with N:= 2 lal =: l]a? and —= i 
V p 


Ds 4 dt 
186 04 
so 5 se = (8: dr + Plal’d(Vppre7/0)).- 


(41) 


According to the Euler-Poincaré equation (37), the dynamics of the fluid and wave 
components of the 1-form in (41) separates into the following equations, 


(a + Lo (ol: ar)) = —dp + Pam), 


(8 + Lo) (la| d /PPref) =0. 


(42) 


This means that the mass-weighted thermal potential vorticity (TPV) dynamics also 
separates into the following fluid and wave components, Q = Or + Qw, given by 


Q d?r := a(o(>- dr +Tal2d el) 
p 


= (div(ovy) -T3 (la}, /Ppref) dr when ¢=Vtwu for D=1, 
=0r r+ Qwda’r, 


with Qr :=div(oVy) and Qw :=I'J(/ppref. lal’). 
(43) 


Then, again, the differentials of the separate equations in (42) yield the ‘non- 
acceleration’ result, 


1 ye 1 
(8 + LOr dr) = zdp Ado? = zJ (p, VY) d*r, 
2 2 (44) 


(8, + Le)(Qw d’r) = 0. 


Equivalently, in coordinates one has 
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a 1 2 
Or +T. VOF = z7 (e IV% Jis 


Qw +T:VQw=0, 
with Qpr :=div(oVy) and Qw :=TI(/ppref. lal’), 
2 (45) 


P oO 
dop +v-Vro=0 and l= °, 


dla? +T- V,a? =0, 


0,0 +T- V,0 = a/ PPref : 
p 


The operator (divo V) is invertible, so long as p is a differentiable positive function, 
which can be ensured by requiring that this condition holds initially, since p is 
advected. Consequently, the stream function w is related to the other fluid variables 
by 


Y := (divpV)!Or. (46) 


The dynamics of the equation set (45) explains why the various physical components 
of the flow coordinate their movements, as seen in satellite observations in Fig. 2. 
In particular, the motion of buoyancy p and squared wave amplitude |a|? are 
coordinated with each other through the advection of the momentum 1-form |a|?dp 
and the area 2-form d|a|* A dp. Likewise, the motion of the fluid potential vorticity 
Qpr and the mass density pọ are coordinated with each other through the mass- 
weighted definition of the stream function in (46). These considerations emphasise 
again the importance of horizontal buoyancy gradients in sea surface dynamics. 


3 Numerical Implementation 


Our implementation of the CoM equations (25) and the CoM equations in the 
SVE approximation (45) used the finite element method (FEM) for the spatial 
variables. The FEM algorithm we used is based on the algorithm formulated in 
[15] and is implemented using the Firedrake* software. In particular, for (25) we 
approximated the fluid potential vorticity Q r, buoyancy p, wave elevation ¢ and 
bouyancy weighted wave vertical velocity w using a first order discrete Galerkin 
finite element space. Similarly, for (45), we approximated QF, p, square of the 
wave amplitude |a|? and wave phase 0 using a first order discrete Galerkin finite 
element space. The stream function y for both models was approximated by using 


3 https://firedrakeproject.org/index.html. 
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a first order continuous Galerkin finite element space. For the time integration, we 
used the third order strong stability preserving Runge Kutta method [12]. 

Figures 3 and 4 present snapshots of high resolution runs of the CoM equations 
and the CoM equations in the SVE approximation. These simulations were run 
with the following parameters. The domain is [0, 1]? at a resolution of 5127. The 
boundary conditions are periodic in the x direction, and homogeneous Dirichlet for 
w in the y direction. To see the effects of the waves on the currents, the procedure 
was divided into two stages for both set of equations. The first stage was performed 
without wave activity for Tspin = 100 time units starting from the following initial 
conditions 


Q F(x, y, 0) = sin(87x) sin(8zy) + 0.4 cos(67x) cos(6r y) + 0.3 cos(102 x) cos(4r y)+ 
0.02 sin(2zry) + 0.02 sin(27x) , 
p(x, y,0) =1+0.2sin(27x) sin(27y) and Pref = 1. 
(47) 


The purpose of the first stage was to allow the system to spin up to a statistically 
steady state without any wave activity. The PV and buoyancy variables at the end 
of the initial spin-up period are denoted as Qspin(x, y) = Qr(x, y, Tspin) and 
Pspin(X, yY) = P(x, Y, Tspin). Figures of these variables are shown in Fig. 5. In the 
second stage, the full simulations including the wave variables were run with the 
initial conditions for the flow variables being the state achieved at the end of the 
first stage. To start the second stage for (25), wave variables were introduced with 
the following initial conditions 


C(x, y, 0) = sin(8zx) sin(8zy) + 0.4 cos(6r x) cos(6zy) + 0.3 cos(10zx) cos(4zy)+ 
0.02 sin(2z7y) + 0.02 sin(27 x) , 
wx, y,0)=0, Or(x, y,0) = Qspin(x, y), p(x, y, 0) = Pspin(X, Y), 
o? Fr? =10°. 


(48) 


To start the second stage for (45), wave variables were introduced with the following 
initial conditions 


lal? (x, y,0) = (sin(87x) sin(8z y) + 0.4 cos(6r x) cos(6r y) + 0.3 cos(107r x) cos(4zy)+ 
0.02 sin(2x y) + 0.02 sin(2xx))* ; 


A(x, y,0) =0, Qr, y, 0) = Qspin&, y), P(x, y, 0) = Pspin(X, y). 
(49) 
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Fig. 5 These figures show the results of the first stage of the simulation in which only fluid motion 
is present and the wave degrees of freedom are absent. The panels show fluid potential vorticity 
QF (left) and buoyancy p (right). The fluid state obtained from the first stage was used as the 
initial condition for the second stage simulations in which wave variables were included. These 
distributions of fluid properties show strong spatial coherence. The coordination of wave and fluid 
properties that emerges in the second stage of the simulations shown in Figs. 3 and 4 arises from 
the interaction between the wave and current components of the flow which is mediated by the 
buoyancy gradient 


Remark 5 Importantly, the wave phase 6 in the second stage was set initially to 
zero. Thereafter, the wave phase @ increased linearly in time in proportion to 
the advected quantity ./pp;er/p following each flow line, as implied by the last 
equation in (45). 


4 Conclusion and Outlook 


This paper models the effects of thermal fronts on the dynamics of the ocean’s 
waves and currents. It introduces and simulates two models of thermal wave-current 
dynamics on a free surface. The original CoM model is derived from Hamilton’s 
principle via the composition of two maps which represent the horizontal and 
vertical motion respectively. The second, a slowly varying envelope (SVE) model, 
is introduced via the standard WKB approximation which takes advantage of large 
separation of the space-time scales between the slow horizontal currents and fast 
vertical oscillations. In particular, the second model introduces the WKB solution 
Ansatz into Hamilton’s principle, whereupon the time integral averages over the 
phases of the rapid oscillations that are out of resonance with the slowly varying 
envelope. Model runs of both models are presented in which the buoyancy mediates 
the dynamics of the currents and waves, as seen in Figs. 3 and 4. These simulations 
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also validate the use of the WKB approximation for two reasons. First, the resolved 
small scale wave features of the original CoM model lie primarily within the 
envelope defined by the SVE approximate model. This means that the dynamics of 
the spatial features of the SVE approximate model are consistent with those of the 
original CoM model, although the resolved space and time scales differ. Secondly, 
requiring that €? /(Fr?°o?) = O(1) ensures that the time scale for the wave envelope 
dynamics matches that of the fluid motion. 

Nonetheless, the two models introduced here merit further study in several 
directions. For example, it remains to: (1) quantify the correlations observed 
visually; (2) determine their rate of formation; and (3) parameterise the model 
for comparison and analysis of the satellite data on which their derivations were 
based. Furthermore, the models discussed here involve only variables that are 
evaluated on the free surface and therefore they neglect bathymetry. A scientific 
challenge persists in understanding regions of the ocean where bathymetry has 
profound effects on the observable surface dynamics, such as in the Lofoten vortex 
[21]. This is a multiscale issue that might be addressed by including mesoscale 
modulations of the sub-mesoscale models derived here. One candidate for providing 
the mesoscale modulations would the thermal quasi-geostrophic (TQG) model in 
which bathymetry has recently been included [15]. 

The currents are modelled here by the two dimensional incompressible Euler 
equations, as seen in Eqs. (2) and (3). Incompressibility is a reasonable assumption 
in some regions of the ocean, for example when the quasigeostrophic approximation 
is valid. There are regions in the upper ocean where other equations are more 
suitable for modelling currents, and the development and investigation of such two 
dimensional models is an open problem which warrants further consideration. 

As mentioned in Remark 1, the wave component of the model presented here 
does not create circulation in the currents. The instabilities present in satellite 
simulations indicate that additional modelling is needed to fully capture this effect. 
Future work will investigate approaches for modelling these instabilities. 

Many other questions remain about wave-current interaction. The full extent of 
submesoscale ocean dynamics is by no means adequately described by existing 
models. For example, we have little understanding of the formation and dynamics 
of various sea-surface phenomena, including the so-called ‘spirals on the sea’ [18]. 
Other questions are emerging because the ocean has absorbed in excess of 90% 
of the heat present in the earth system as a result of human activity during the 
post-industrial era [19]. The absorption of heat from the warming atmosphere is 
ongoing and it is forecast to become more dramatic. This absorption has resulted 
in ‘marine heat waves’, which are predicted to increase in frequency and severity. 
These changes to the upper ocean, where most of this heat is stored, could 
have a profound effect on the dynamical landscape of our oceans. These effects 
may, in turn, influence our weather and climate systems. Over the millennia, the 
ocean has approached statistical equilibrium under its current forcing conditions. 
Using modelling terminology, one says the ocean is well ‘spun-up’. However, the 
continued warming of the ocean is likely to influence the number and intensity 
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of thermal fronts. One hopes that mathematical models will provide a useful 
framework for estimating some of the potential impacts of these thermal fronts on 
atmospheric effects, as well. 
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and Their Applications to Primitive career | 
Equation Models 


Ruiao Hu and Stuart Patching 


Abstract We present a numerical investigation into the stochastic parameteri- 
sations of the Primitive Equations (PE) using the Stochastic Advection by Lie 
Transport (SALT) and Stochastic Forcing by Lie Transport (SFLT) frameworks. 
These frameworks were chosen due to their structure-preserving introduction of 
stochasticity, which decomposes the transport velocity and fluid momentum into 
their drift and stochastic parts, respectively. In this paper, we develop a new 
calibration methodology to implement the momentum decomposition of SFLT 
and compare with the Lagrangian path methodology implemented for SALT. The 
resulting stochastic Primitive Equations are then integrated numerically using a 
modification of the FESOM2 code. For certain choices of the stochastic parameters, 
we show that SALT causes an increase in the eddy kinetic energy field and an 
improvement in the spatial spectrum. SFLT also shows improvements in these areas, 
though to a lesser extent. SALT does, however, have the drawback of an excessive 
downwards diffusion of temperature. 


Keywords Primitive equations - Geometric mechanics - FESOM2 - Stochastic 
parameterisation 


1 Introduction 


Uncertainty can be present in ocean models due to a number of factors including, 
but not limited to: small-scale processes not resolved by the grid; observation error; 
model error; numerical error and unrealistic viscosities imposed to ensure numerical 
stability. Several stochastic parameterisation techniques [PZ14, Ber05, Mem14, 
Hol15, HH21] have been proposed recently as ways of representing uncertainty in 
ocean models. Because these parameterisations are probabilistic, it is possible to 
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generate ensemble forecasts [CCH+19, CCH+20, Cot20, UJPD21] with associated 
means and variances, which can then be applied to data assimilation. This work 
will focus on two frameworks which introduce stochasticity in a way that preserves 
certain fundamental and desirable properties of fluid flows. These frameworks are: 
Stochastic Advection by Lie Transport (SALT) [Hol15] and Stochastic Forcing by 
Lie Transport (SFLT) [HH21]. Both SALT and SFLT are derived from variational 
principles, from which we may observe the geometric structure of the fluid equations 
and the conservation laws which are inherited. 

The key assumption of SALT is the decomposition of transport velocity into a 
slow mean part and a fast, rapidly fluctuating part around the mean. In the limit 
of high fluctuation frequency, one can use homogenisation theory to transform the 
rapidly-fluctuating component to a sum of stochastic vector fields [CGH17]. Thus, 
the modification from the deterministic flow is the addition of stochastic vector 
fields to the transport velocity. This stochastic modification has been shown [Hol15] 
to preserve the Kelvin circulation theorem and the advection equation for potential 
vorticity. In the case where buoyancy obeys an advection relation, the potential 
vorticity is conserved along particle paths. However, SALT violates energy conser- 
vation since stochastic Hamiltonians are introduced into the variational principle. 
The application of the SALT in quasi-geostrophic (QG) models and the 2D Euler 
equations has been investigated before in [CCH+20, CCH+19, Cot20]. However, 
these models are too simplistic to be used in operational ocean simulations, and 
the majority of ocean codes (e.g. MOMS [GBB+00], ICON [Kor17], MITgcm 
[MAH+97], FESOM2 [DSWJ16]) solve the Primitive Equations (PE). For this 
reason, if SALT is to be employed for use in practical applications, it must be 
adapted for use in PE. This introduces additional features to the model as compared 
the QG or 2D Euler: in PE there are advected quantities such as temperature and 
salinity, which in the SALT framework are advected by the stochastic velocity. 
There is, moreover, a subtlety in the pressure arising from the imposition of a semi- 
martingale Lagrange multiplier in the incompressibility condition of the variational 
principle [SC21]. 

An alternative stochasatic parameterisation is the more recent SFLT framework 
[HH21]. Derived via a Lagrange-d’ Alembert principle, SFLT allows the addition 
of arbitrary stochastic forcings to the evolution equations of the momentum and 
of the advected quantities. This modification differs from SALT, as stochasticity is 
added in the variational principle after taking variations of the Hamiltonian for the 
deterministic system . By considering the Lie-Poisson bracket of the system, we 
choose the forcing to be of a particular form that preserves, on every realisation 
of the noise, the original (deterministic) Hamiltonian. For PE, the Hamiltonian is 
given in Eq. (2). However, the addition of energy preserving forces will modify 
the Kelvin circulation theorem. In the current work, we will consider the case 
where the stochastic forcing is in the energy conserving form and applied to the 
momentum equation. As in the SALT case, stochastic pressure terms will appear 
in the momentum equation due to the imposition of semi-martingale Lagrange 
multiplier in the incompressibility constraint. Prior to the present work, SFLT has 
not been implemented into numerical models. 
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The rest of the paper is structured as follows. In Sect. 2, we derive PE with 
both SALT and SFLT from a variational principle and we show the conservation 
properties from the resulting equations. In Sect. 3, we consider calibration pro- 
cedures to calculate the stochastic parameters of SALT and SFLT. In particular, 
we use the Lagrangian paths method of [CCH+20] but also consider a simpler 
technique, that of Eulerian differences, which we propose is more appropriate for 
use in SFLT. In Sect. 4, we present numerical results of applying SALT and SFLT 
to FESOM2 [DSWJ16] (see Sect. 5), demonstrating the different effects of these 
stochastic frameworks and the sensitivity to the choice of parameters. 


2 Stochastic Primitive Equations 


2.1 Variational Principles for Stochastic Primitive Equations 


Variational principles may be used to derive systems of fluid equations [HMR98, 
HSS09] which obey conservation laws such as the Kelvin circulation theorem. 
To derive the Primitive Equations from a variational principle, the appropriate 
Lagrangian is [HSS09]: 


1 
Iu, D, T, S) = T G lu? +u-R—V(T7,S, D) Dd>x, (1) 


where u = (u, v) is the horizontal velocity vector field, R is the Coriolis potential, 
which satisfies curlR = f(y)z with f(y) = 22cosy and Q = 2x/day is 
the rotational frequency of the earth. T and S are the temperature and salinity 
respectively; these are tracers advected by the fluid. D is the Jacobian of the flow 
map g; that maps a fluid particle at initial position xo to its position x; = g;Xo 
at time t. V is the potential energy, which has explicit dependence on T and S, as 
well as on the vertical coordinate z. The three-dimensional velocity shall be denoted 
v = (u, w). 

In order to obtain the correct hydrostatic balance condition the potential energy 
should obey wv (T, S, z) = g(1+b) where the partial derivative is taken with respect 
to z at constant T, S. b is the buoyancy, given by the equation of state b = b(T, S, z). 

It is convenient here to use the Clebsch version of the variational principle 
[CH09] in Hamiltonian form. The Hamiltonian is given by Legendre transformation 
as h(m", D, T, S) := (u, #) — 1(u, D, T, S) where m? := #4 = D (u + R) is the 
horizontal momentum. We have also defined the inner product (p, q) = f p- qd 3y, 
We shall use the same angle-bracket notation for all such pairings, when p and q 
are dual variables, e.g. vector field and 1-form density; or a scalar and a density. The 
Hamiltonian can be written explicitly as: 
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h 


h 1 |m 
him", D,T,S) = jlo S 


2 
+V(T,S, o) Dd?x. (2) 


In the Clebsch variational principle when SALT or SFLT are present, the (3- 
dimensional) transport velocity dx is defined to be a stochastic process. The form 
of dx is defined using Lagrange-multiplier constraints to impose the transport 
equations (d + Lay) a = 0, where a € {D,T, S} [see SW68]. Here we remark 
that for clarity, we denote by an italic d the spatial differential and a straight red 
d for the stochastic time-increment. Lay denotes the Lie derivative, which is a 
differential operator with a form that depends on the object on which it acts. We 
remark here that there is a slight abuse of notation and we shall write D as a short- 
hand for Dd?x so that this is a density 3-form and the Lie derivative is given by 
LayD = V- (dx D). T and S are scalars, so we have LayT := dx - VT and 
similarly for S. In order to obtain the incompressibility of the transport velocity dx, 
we include an additional constraint to set D = 1 where the Lagrange multiplier 
will be interpreted as the pressure. Since the Hamiltonian h only depends on the 
horizontal momentum m”, we need to include an extra constraint so that the vertical 
component of the momentum is set to zero; this will give us hydrostatic balance. 


The defining feature of SALT is that the transport velocity is the sum of the drift 
velocity and a number of stochastic corrections to the drift: 


dx (x, t) = v(x, Ndt+ X` £; (x, t) o dW} , (3) 


L 


where & (x, t) are arbitrary vector fields. We remark here that Eq. (3) is a stochastic 
process at fixed Eulerian points x and we do not solve for this process explicitly. 
dx is distinct from the particle trajectories x,, which evolve in time according to 
dx, = v(x, t)dt +}; E; (x, to dw; and will be used during calibration procedures 
in Sect. 3. We can impose the form of the transport velocity specified in Eq. (3) by 
including in the action some additional stochastic Hamiltonians `; h; (m?) o dw! 
where the horizontal component of the parameters is given by & i (x,t) = on . The 
three-dimensional momentum is denoted m = (m’, m3). We note that in principle 
&€; may depend on time; however, we shall henceforth assume for simplicity that 


£; = &,(x) is a function of space only. When h; are independent of m”, we have 
the relation dx (x, t) := v(x, t)dt, so that dx reduces to the original deterministic 
transport. 


SFLT is included [HH21] via a Lagrange-d’ Alembert term (5dx , F) added to the 
variation of the action ôS. Since this is added after variations of the action are taken, 
the forcing F can in principle be arbitrary. Overall, the variational principle takes 
the following form: 
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0=s6S=5 J (dx, m) — h(m™, D, T, S)dt — (dt, m3) — (dP, D — 1) 
+ (a, (d+ Lax) D) + (2, (d + Lax) T) + (y, (d + Lax) $) 


Ng 

-9 hi (m”, g) o dwi - f bax, P) . 
i=l 
———— m 


SFLT 


(4) 


SALT 


The first two lines of Eq. (4) are what would be included in the unmodified 
variational principle. d¢ is a Lagrange multiplier, enforcing m3 = 0 and after taking 
variations can be interpreted as the vertical component of the stochastic transport 
velocity. Indeed, we may expand dé = wdt + )°; g0 o dw; ; note that here d¢ is 
varied and so the third component of &; is treated as a variable in the action, whereas 
the horizontal components are treated as fixed parameters. The final term on the top 
line enforces incompressibility, and the Lagrange multiplier dP must be stochastic 
since a semi-martingale Lagrange multiplier is required to enforce a condition on 
the semi-martingale D [see SC21]. On the second line the quantities œ, 6, y are 
Lagrange multipliers enforcing the fact that D, T, S are advected quantities. The 
final line contains the modifications required to include SALT or SFLT; we shall 
not in practice use both SALT and SFLT together, but for compactness of the 
presentation we include them together here. The first modification, giving SALT, 
consists of a sum of Nz Hamiltonians multiplied by Stratonovich noise. The second, 
additional term is a Lagrange-d’Alembert term which introduces a shift F in the 
momentum. We remark that by including further Lagrange-d’ Alembert terms such 
as (da, dFp) or (58, dF) etc. we may add arbitrary forcings to the right-hand side 
of the equations for the advected tracers. However, we do not consider this here. 


The equations resulting from the variational principle 5S = 0 are: 


oh oh; 
dm! dx = dt + D og odWy : (5a) 
i 
m3 : dy = de; (5b) 
ôdx : m=acD+f6oT+yoS+F; (5c) 
ôa : (d+ Lay) D=0; (5d) 
ôB : (d+ Lay) T=0 (5e) 
by: (d+ Lay) S=0 (Sf) 
bh 
ôD: (d+ Lax) a =- (ar + =z") (Sg) 
bh 
ôT : (d+ Lax) B = -zt ; (5h) 
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oh 


ôS: (d+ Lax) y = -zt (Si) 
ôdP : D=1; (5) 
ôdç : m3=0; (5k) 


The diamond in Eq. (Sc) is a binary operator acting on two variables that are dual 
with respect to the inner product (-, -) (e.g. or scalar and density) and giving a 1- 
form density. Explicitly, for two dual variables p, g and an arbitrary vector field X : 
the diamond is defined by the relation (p © q, X) = — (p, £xq). We can compute 
these explicitly as follows: 


êh 6h 6h oh êh êh 
—oD=DV A eT = VT, — oS =- — VS. (6) 
êD 5D ôT ôT êS és 


We note that the form of dx as given in Eq. (3) is not an input to the variational 


principle, but a consequence of it. Indeed, we obtain Eq. (3) by defining v := 


ôl ; êh Q) . ayi bh 
(2 w) and £; := (2 E J The horizontal velocity is therefore u = sp" = 


mi — R. The fact that D = 1, combined with Eq. (5d) gives the incompressibility 
condition V - dx = V™ .dx™® + ede = 0. By Doob-Meyer decomposition 
[D0053, Mey62, Mey63], we can split the incompressibility condition into its drift 
part and stochastic oscillations. Thus we are able to compute w, g9 in terms of u 
and £™ respectively: 


y®.u+ =e: vg 4 Ei o, (7) 
az Oz 


Boundary conditions at z = 0 allow us to integrate Eq. (7) in the vertical direction. 
To obtain the momentum equation we apply (d + Lax) to both sides of Eq. (5c) and 
use the fact that the Lie derivative obeys a Leibniz rule with respect to the diamond 
operator. After some re-arranging, we obtain: 


m* — F bh aV 
(d+ Lay) = ax) =-d (3 — v) dt + ar) + g et. (8) 


We shall show in Sect. 2.2 that the SFLT terms will conserve energy if we 
require that the momentum shift F takes a particular form, which is that it satisfies 
(d+ Lax) F = Lyd®, for some stochastic process d®. In this work, we shall 
assume further that d® has the form d@ = $`; Ø; o dB} for some spatially 
dependent parameters @, and with Bi being a set of independent Brownian motions. 
Because the momentum m = (m*, 0) has only horizontal components, we shall 
assume that @, also have only horizontal components. Moreover, we can expand 
the pressure in terms of its drift component and Brownian increments: dP = 
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pdt + >; pi o dW) + $; pı o dB}. Thus, writing m = u + R and expanding 
dx in terms of v and &;, we find that Eq. (8) becomes: 


p Vp +g(1 + b)3] dt 


TA u) + f2 x é;+Vé;-u+V (pi +; - R)] odw; 6) 


"ay - (vo) — Vo, -v—V (pr —v- ;)] 0 dB; =0. 


The first line of Eq. (9) contains the terms of the deterministic momentum equation, 
the second line contains the SALT terms and the final line contains the SFLT 
contributions. Equation (9) is a three-dimensional equation, but the third component 
is the (diagnostic) hydrostatic balance condition rather than a prognostic evolution 
equation for w. In the cases of SALT and SFLT hydrostatic balance includes 
additional constraints on the stochastic parts of the pressure dP: 


dp ap; ə; dp’ dg; 

£ = —g(l1 +b), hed a a ee A ay 10 

dz ree dz dz dz az oo 
where we have the definitions of the shifted stochastic pressure terms p; := pi + 


&; -Rand pi := pr — V: $1. We solve Eq. (10) by imposing the following surface 
pressure boundary conditions: 


Plz=0 = 87, Pilz=0 = Vi , Pilz=0 = VI; (11) 


where 7 is the free surface height. The boundary condition on p is that used in the 
linear free surface approximation, which is employed in FESOM2 [DSWJ16]. Wy; 
and wy are functions only of the horizontal direction and are arbitrary. They may be 
used to introduce some stochastic atmospheric forcing at the ocean surface, but we 
do not consider this in the present work. For simplicity we shall set y; = yr = 0 
for all i, Z. Solving Eq. (10) with the boundary conditions in Eq. (11) gives us the 
following: 


0 
reno bak (12a) 
-udz’, (12b) 


0 
ð 
p, =v +f 5 -vdz'. (12c) 
z Us 
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A more exact condition on the deterministic pressure would be p|,—, = 0. Using 
this gives almost the same result for p except that the upper limit of the integral will 
instead be n. 


The equation for the evolution of the free surface height 7 is obtained by 
integrating the incompressibility condition and using appropriate surface boundary 
conditions. For the linear free surface approximation we take w|z-9dt = dy; at 
the bottom boundary z = —H (x, y) we have dx|,=-y - V (z + H) = 0. Thus, 


integrating the incompressibility condition in the vertical direction from z = —H to 
z = 0 we find, in the linear free surface case: 


0 
atv: f udt dz = 0. (13) 
-H 


Again, the more exact boundary condition would be dX |z=ņ - V (z — n) = dn ad in 
this case Eq. (13) is modified by udt + udt + J; £; o dW: and the upper limit of 
the integral will be 7 rather than 0. However, for our numerical simulations we use 
the linear free surface. 


From Egs. (5e) and (5f) we have the advection equations: 


dT +v-VTdt+ )°&-VT odW; =0, (14) 


l 


dS+v-VSdt+ }°&-VSodW; =0, (15) 
I 


for temperature and salinity respectively. The horizontal component of the momen- 
tum equation (9), along with the solutions Eq. (10) for pressure (with the equation of 
state b = b(T, S, z)), the incompressibility conditions Eq. (7), the tracer advection 
equations (14) and (15) and the linear free surface equation (13) give us a complete 
set of fluid equations, the Primitive Equations with SALT and SFLT. 


2.2 Conservation Laws 


The key benefit of the SALT and SFLT frameworks is that they retain some of 
the fundamental conservation properties possessed by the deterministic equations. 
By writing the Primitive Equations in the geometric form given in Eqs. (5d)—(5f) 
and (8), we may demonstrate the effect of the stochastic frameworks on these 
conservation laws. First, we consider energy conservation. The total energy is equal 
to the Hamiltonian, as given in Eq. (2). For convenience of notation, we define 
him, D,T,S,w) = him", D,T,S) + (m3, w). h and h are equal on solutions of 


oh 


the equations, but we have Sm 


energy is given by: 


= v. By direct calculation, the time evolution of the 
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dh =) Ie Yom) + (sa +b), s9) o dW; 


i : (16) 
- (a+ cope] : 


ôm 


Thus, the energy conservation property is violated by the stochastic terms. The two 
terms on the right-hand side of the pairing in Eq. (16) come from SALT and SFLT 
respectively. However, as shown in [HH21], the energy deviation from SFLT can be 
nullified by choosing (d + Lax) F = }); Lye; © dB/ for some parameters (x). 
Indeed, by the anti-symmetry of the vector field commutator: 


bh bh 
(3. Dees oa!}=([ x]. Zoroa) <0, (17) 


where the square bracket [-], denotes the commutator of vector fields. Thus, energy 
conservation is broken by SALT but preserved by a class of stochastic forcing in 
SFLT. In the remainder of the paper, we shall assume the stochasticity introduced 
by SFLT are in the energy preserving form. 


The next conservation law we consider is the Kelvin circulation theorem. The 
evolution of the circulation corresponding to Eq. (8) is given by: 


m” i 
ag "dx= -g$ WT, S,z)dzdr + > G (curl @; x v)- dx odB, , 
ca P Ct) T ICU) 


(18) 


where C(t) is a closed loop moving with the transport velocity dx. We see that 
SALT affects the circulation theorem only by modifying the advection of the loop; 
thus the circulation theorem for SALT is the same as in the deterministic case, but 
with the circulation considered around a stochastically-transported loop. Therefore, 
circulation is generated only by buoyancy gradients being misaligned with the 
vertical direction. In SFLT, on the other hand, there are additional forces introduced, 
which generate the circulation of fluid momentum. 


The evolution of potential vorticity associated with Eq. (8) can be expressed as 


1 ðb z 1 I 
(d+dx-V)q= 52 V (Zax ) + pve Lv (vær) — wr - Vv] o dB}, 
(19) 
where w := curl (m’" / D) is the relative vorticity, @y = curl@, is the stochastic 


vorticity generated by SFLT and g := bow - Vb is the potential vorticity. Similar 
to the Kelvin circulation theorem, SALT introduces stochasticity in the transport 
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velocity dx, while SFLT introduces stochastic forces that act on the advection of 
fluid potential vorticity. If we assume that the buoyancy has no explicit dependence 
on the vertical coordinate, i.e. ge = 0, then q is purely advected by the flow in the 
absence of SFLT. 


3 Calibration of the Stochastic Parameters 


3.1 Lagrangian Paths 


In order to calibrate the parameters £; used in SALT we propose to use the method 
of Lagrangian paths introduced in [CCH+19, CCH+20]. 

First, we perform a fine-grid model run, which we shall take to be the ‘truth’. 
Resulting from this run we get an output velocity v(x,f) saved at times ft € 
{t1,...,tv—1,tn}, where the time interval between subsequent sample times, 
ti41 — tj, is greater than the velocity decorrelation time, defined to be the smallest 
t at which the auto-correlation function C(t) is less than e~!. Suppose the fine- 
grid resolution is M times that of the coarse grid, in which case the coarse-grid 
time step is given by At, := MAty, where Atp is the time step for the fine- 
grid model run. In order to compute Lagrangian paths we also save v(x, ft) at 
te fti, ti + Atf, ..., ti + (M — 1)Atr} for each i = 1,..., N 

To obtain the corresponding coarse-grid velocity v(x, t) from v(x, t), we apply 
a coarse-graining operator to v(x, t), which consists of a local average over fine- 
grid points, to obtain a velocity v(x, t) defined on the coarse grid. Considering a 
distribution of tracer particles whose initial positions xj are the (three-dimensional) 
coordinates of the coarse-grid nodes (enumerated by r), we compute Lagrangian 
paths on the fine grid and coarse grid respectively: 


-1 


x’ (ti + MAtf) := = Xo + Sa (x; ( ti + mAtf) , ti + mAty) Atf, (20a) 
m=0 
xi (ti + MAtp) = xp + V (Xiti), ti) Ate, (20b) 


where x! (ti + MAt f) and x’. (ti +MAt f) are the Lagrangian paths computed as 
integral curves of vy and Y respectively; the integral is carried out over one coarse- 
grid time-step, which is equivalent to M fine-grid time steps. We can then define 
the difference AX, ; = Ax(t;, Xp) := = x’ (t + M Atp) — x (t, + MAtf) and apply 
the method of [HJSO7] to compute the Empirical Orthogonal Funciions (EOFs). To 
summarise, we subtract off the time mean to define Ax, = AX i- N LT ra AX; i. 
In the x-direction we then have a matrix with components Ads From this we 


construct the matrix A® which has components A® = 4 D F Ax’. , Ax, ;. The 


EOFs in the x-direction are then defined to be the eigenvectors of the matrix A“? 
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which we denote as a, fori = 1...N. They are normalised in the sense that 
2 aoga (x) = ĉij, where the sum is over all grid points. We apply the same 
process to the y-component Ay’ , to obtain N eigenvectors in the y-direction, which 


we denote a We do not compute the eigenvectors for the z-direction since these 


will be obtained from the incompressibility condition. 


We remark that the method we have used here, in which we compute the EOFs 
of each component of Ax separately, is different from the method found in other 
sources [e.g. HLB96], in which the components are computed together and we 
obtain a set of two-component eigenvectors a; immediately, with one eigenvalue 
A; corresponding to each of these EOFs. However, this method was attempted for 
SALT runs in the current set-up and the results of model runs were less successful. 
For this reason we have chosen to compute the components separately. 


Thus, in our case we have N eigenvectors in each of the horizontal directions 
and these will have associated eigenvalues a and ge We define the horizontal 
components of &; by a re-scaling of these eigenvectors. The magnitude of the 
eigenvalue a gives an indication of how much of the variance is captured by 
the corresponding eigenvector. Therefore, we choose to scale the parameters so 


that (e m é n « 4;. Moreover, in order to ensure that the different methods for 


computing &; may be compared fairly, we require that the L2-norm of the sum be 
the same for each method. Thus we impose the following: 


Ng 


= L 81”) = v? 21) 


where y is a constant with units ms~'/2, which we shall choose later; V;o; is the 
total volume of the domain. and Ne < N is the number of EOFs we choose to 
keep for our model runs. The total integral, denoted by angle brackets, is defined by 
(a, b) := `, a(x) - b(x) V(x). We can achieve the required properties by choosing 
the Ee scaling: 


(x) AO Vio a® 
x 0 x 
l = 22 
& Œ) Van (x) (22) 
wie V (x) is the volume of the grid cell located at x and we have defined Ajo; := 
y 1A; (x) +F dy, After computing the horizontal components in this way, m are 


then aoten to zero near the boundaries in order to enforce the impermeability 
condition at the boundary, £; ) . n = 0, where n is the normal to the boundary and 


gM ED, BO) j is the horizontal part of £; = em, ED, Be), 
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For the z-component we use the incompressibility condition Eq. (7) along with 
the impermeability condition £; - V (z + H) = 0 at the lower boundary z = —H to 
obtain: 


z 
z h 
eave / „E dz, (23) 


where V“") = È, S) is the horizontal gradient. This method for computing the 
vertical component of &; is applicable to any system of fluid equations with an 
incompressibility condition. We could, alternatively, compute all three components 
of é; as EOFs of the three components of Ax. However, the resulting three- 
component vector &; will not be guaranteed to be divergence-free. We would then 
need to subtract off the divergent part £; > & = &, — VAT! (V- é;) where A7! 
is the inverse Laplacian. However, computing the divergent part of the vector &; 
is computationally expensive; moreover, the components of £; computed in this 
way will not be guaranteed to be orthogonal with respect to (-,-). Thus in this 
paper we consider only the £; for which the vertical components are computed from 
integrating the incompressibility condition. 


3.2 Eulerian Differences 


To calibrate the parameters @; used in SFLT we propose an alternative method 
by using differences in fixed Eulerian coordinates. Consider the deterministic 
momentum equation given by: 


(d+Z£ ) (m) =- sient ornk 0 (24) 
ve PTSD ôT 5S i 


and the SFLT equation: 


h\ .. 6h -o 8h - 
(d+ Lear) (mm ") - Da (@) =~ (p+ =) o Da apo bat — 0 Sat, 
(25) 


where the notation (-) are used on the variables of the SFLT equations to emphasise 
the difference between deterministic and stochastic variables. The goal of the 
stochastic parameterisation is to decompose the “true” fluid flow to a slow drift 
component and a rapid fluctuating component whose amplitude can be estimated 
from data. In the example of estimating the momentum fluctuation d® of m”, we 
denote the slow drift component as m” and we seek the solution to the minimisation 
problem 


-h 2 
min | [am — din | | , 26) 
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Assuming D, T and S do not have rapidly fluctuating components, the minimisation 
problem becomes 


2 
min à | Lym? dt) — L; (mdt — ap)| | : (27) 
We see that this minimisation problem can be solved by taking 
dd = (m’ — m’) dt = (u — u) dt. Therefore, we define the differences 


AX; 1 := AX(ty, Xo) = [ut, Xp) — U(t, xp) | Ate (28) 


for Z = 1,..., N. We then assume the expansion d® = Xi- 19r o a6, As 


before, we subtract the time-mean to obtain Ax, 1 = AX, — WD l= y X, ; and 
then compute the EOFs exactly as we did in Sect. 3.1 to get our Parame $r. 


In both methods we initially compute horizontal components of the stochastic 
parameters using EOFs, but for SALT there is the additional step of integrating 
the incompressibility condition to obtain the vertical component. The vertical 
component is not needed for øp since it is a part of the decomposition of the fluid 
momentum m, the vertical part of which vanishes in the Primitive Equations. In 
fully three-dimensional models in which the vertical component of the momentum 
is non-zero, the Eulerian differences of the momenta will be a three-dimensional 
object and one can compute all three components of the parameters @; using EOFs. 

We can also consider using Eulerian differences as an option for £; in SALT. 
This effectively means approximating the fine-grid Lagrangian path by taking only 
one time-step in the coarse grid: xf ~ vy (Xo, t)Ate. We can expect that this will 
be a reasonable approximation for small M, but for larger M the Lagrangian paths 
method will diverge from the Eulerian differences. In our numerical investigations 
in SALT we shall consider £; computed from both the Lagrangian paths method 
and Eulerian differences method. For SFLT we also consider ø; computed from 
Lagrangian paths (for completeness) as well as those computed by Eulerian 
differences as described above. 


4 Results 


We solve the Primitive Equations using the FESOM2 code on a rectangular domain 
[0, 40°] x [30°, 60°] x [0, —H], where H = 1600m is the depth of the domain and 
the bathymetry is flat. Impermeability conditions are imposed at all boundaries. The 
model is spun up for three years ne zero initial velocity and an initial temperature 


profile given by T(z) = To + oa (a — B) tanh (2 -) + Ba): which is based on 
the test case described in [RDH+12, SDB+16]. We take Tp = 25°C, B = 0.05, 
à = 5kgm~3, zo = 300m, po = 1030kgm~?, and œ = 0.00025K~!. For simplicity, 


salinity is kept constant and we use a linear equation of state which depends only on 
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temperature: b = —a(T — 10°C). The flow is driven by a wind forcing in the upper 
layer given by t(x, y) = Ae cos (725) $, where Azo = 10m is the thickness 
of the upper layer; tọ = 0.2ms~? is the wind strength. The vertical discretisation 
consists of 23 layers, with layer thicknesses increasing with depth. For the horizontal 
discretisation we take a fine grid of spacing 1/4° and a coarse grid of spacing 
1/2°. At the latitudes we are considering, 1/4° corresponds to an eddy-permitting 
model, while 1/2° may be considered non-eddy resolving [see Hal13]. We run the 
deterministic model on the fine grid and the coarse grid, and carry out the SALT 
and SFLT runs on the coarse grid. All coarse-grid runs are begun from the same 
initial condition, being the final time snapshot after the three-year spin-up period; 
the fine-grid run is begun from the end of the three-year spin-up on the fine grid. 
We save data in each case at intervals of 15 days, over a time period of 10 years, 
for a total of 240 snapshots. From the fine grid data we have the ‘truth’ velocity 
vf. To this we apply a coarse-graining V; we then follow the procedures outlined in 


Sect. 3 to compute & m and @,. However, there is no canonical choice for how the 
coarse-graining should be done. We consider a filter defined by an equally-weighted 
nine-point average over nearest neighbours, and we denote this filter F; this filter, 
applied once, has a width equal to the spacing on the coarse grid, i.e. 1/2°. The 
coarse-graining will then be done by applying this filter N¢;;; times successively, 
then projecting onto the coarse grid. Thus, the smoothing filter applied N fin times 
will be denoted F^"; this has a width N fit /2 degrees with a stronger weighting 
for points closer to the centre of the filter. We consider the cases N fj; = 1, 4, 32. 
From the deterministic model run, we have velocities saved at 240 time snap- 
shots, so we can use these to compute 240 EOFs. We do this for both the Lagrangian 
paths method and the Eulerian differences method, for each of the three choices of 
N gir; this gives a total of six sets of parameters. In our model runs we shall choose 
to keep Ng = Ng = 32 of these parameters for each run. In Fig. 1 we plot the 
square-root of the sum of the squares of these parameters (before re-scaling by y) 
as a field in space. From Fig. | it appears the differences between Lagrangian paths 
or Eulerian differences are minimal. We remark that here the time-steps on the fine 
and coarse grids differ only by a factor of 2; it is expected that if a bigger difference 
in resolution is used, then more steps will be needed in computing the Lagrangian 
paths and therefore the corresponding parameters will differ more substantially. The 
number of times we apply the smoothing operator, however, has a much greater 
effect and we see significantly different fields with N f; = 32 than we do with 
N fin = 4 or Nfin = 1. Indeed, it appears from Fig. 1 that the weaker filter causes 
the parameters to be more strongly concentrated around the western boundary, 
whereas for the stronger filter the parameters are spread more across the domain. 
The cumulative spectra of the EOFs are shown in Fig. 2. These spectra show us 
how many EOFs are needed to capture a given percentage of the total variability; 
or conversely, how much variance is captured by a given number of EOFs. We 
show in each case how much variability is captured by using 32 EOFs. In all cases 
the Lagrangian paths method gives a slightly higher variability captured, though 
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Fig. 2 Eigenvalue spectra of zonal &;, plotted for three different values for N fir. On each panel is 
shown the spectrum for the EOFs calculated by Lagrangian trajectories and Eulerian differences. 
The horizontal lines show what the percentage of the total variance is captured by choosing Ng = 
32 EOFs 


the difference is small, especially for the smaller values of Nfi. A much bigger 
variability is captured, however, in the Nfi = 32 case when compared with the 
N filt = 1 case. 

We implemented SALT and SFLT into FESOM2 (see Appendix section) and ran 
the model with each choice of parameters and with the appropriate re-scaling as 
detailed above. For all SALT runs we use Ne = 32 with the scaling y = 2 x 
10-3ms—!/?, For SFLT we also take Ny = 32 but scale the parameters with y = 
102ms—'/?, This re-scaling is chosen empirically taking y to be the largest value 
possible that will not result in model blow-up. It appears that the magnitude of 
parameters that we are able to use for SFLT is much higher. This is possibly due to 
the fact that SFLT does not involve any direct modification of the tracer equation. 
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SALT, on the other hand, includes an advection of the temperature by the stochastic 
transport velocity; using higher values for this velocity may destabilise the tracer 
equation and cause model blow-up. 


The results of these runs are shown in Figs.3, 4, 5. Figure 3 shows the eddy 
kinetic energy (EKE), defined by E = 5 ju — (u)|?, where (u) is the time-averaged 
velocity. We notice that the eddy kinetic energy is significantly less in the coarse- 
grid deterministic run than it is in the fine-grid run. This is probably due to the 
fact that small scales are less present in the coarse-grid flow, and in the coarse-grid 
model the viscosity used is greater and so kinetic energy is dissipated at a faster 
rate. However, when we include SALT there is, for most choices of &;, a notable 
increase in EKE across the domain, particularly around the western boundary. The 
exception is in the cases in which the coarse-grained velocities V used to calculate 
&; are defined with only one application of the smoothing operator, as shown in 
panels (c) and (d) in Fig. 3. This could be because, from Fig. 2, the inclusion of 32 
&; captures a smaller amount of the total variability; it may also be that the effect of 
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Fig. 3 Time-average of eddy kinetic energy at depth 16m below the surface. Panel (a) is from 
the high-resolution (1/4°) deterministic model, while (b) is from the low-resolution (1/2°) 
deterministic model. Panels (c), (g), (k) are the results of model runs at 1/2° with SALT, where 
&; are computed using Lagrangian differences using a coarse velocity defined by applying the 
smoothing filter 1, 4 and 32 times respectively. Panels (d), (h), (1) are also SALT runs but &; are 
computed from Eulerian differences rather than Lagrangian trajectories. Panels (e), (i), (m) are 
SFLT runs with @, computed from Lagrangian trajectories, while (f), (j), (n) have ø; computed 
from Eulerian differences 
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Fig. 4 Spectra of eddy kinetic energy for SALT (left panel) with £; calculated from Lagrangian 
paths and from Eulerian differences; and for SFLT (right panel) with ø; calculated from 
Lagrangian paths and from Eulerian differences. Also included in each plot are the spectra for 
the deterministic runs on the fine and coarse grids. Spectra are calculated in the x-direction at 


fixed y = 451° by [ê| := o |s E(x, tel dx| dt. Here trax = 10years and t = 0 
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Fig. 5 Vertical profiles of temperature horizontally-averaged across the domain after 10 years of 
model time. The left-hand panel shows the results from the SALT runs, alongside the deterministic 
runs. The right-hand panel shows the results from the SFLT runs, alongside the deterministic runs 


the &;s is more spread out across the domain, as shown in Fig. 1, which overall has a 
greater impact than having them more highly concentrated in one region. For SFLT 
there is only a modest improvement in the EKE field, and the effect is similar for all 
choices of the parameters. In all cases there appears to be little difference between 
the Eulerian differences method and the Lagrangian paths method when the same 
N fitt is used. 

We can also consider the spatial spectra, as shown in Fig. 4. There we see that 
the 1/4° run contains higher EKE at all scales than the low-resolution run. Every 
SALT run succeeds in increasing the energy at almost all scales and in shifting the 
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Fig. 6 Time series of spatially-averaged temperature fields for SALT runs at z = —5 m (left panel) 
and z = — 1350 m (right panel) 


spectrum towards that of the 1/4° run. The most significant improvements are seen 
in the run with Eulerian parameters computed with Nj; = 32; in contrast, there 
is only a small change from the deterministic run when the Ny; = 1 Eulerian 
parameters are used. For SFLT the improvement is again less noticeable, with all 
choices of parameters only giving a slight increase in EKE at all scales. 

Since we are working with the Primitive Equations, the buoyancy can have 
a large effect on the fluid flow. We therefore consider the temperature, which 
determines buoyancy directly via the linear equation of state. Figure 5 shows 
vertical temperature profiles at the end of the ten-year run. In the coarse-grid model 
there is a slightly lower average temperature in the upper layers of the fluid, and 
slightly higher temperatures in the lower layers. However, with SALT included 
there is, for some choices of parameters, a significant reduction in temperature in 
the upper layers, while at lower depths the temperature increases relative to the 
deterministic model. Considering the time series of spatially averaged temperature 
atz = —5 m and z = —1350 m in Fig. 6, we see the downwards diffusion effects are 
persistent in time. In the deterministic case we see that the coarse-grid model has a 
stronger downwards diffusion of temperature than the fine-grid run. The inclusion of 
SALT also accelerates this downward-diffusion effect. It therefore appears that the 
calibrated stochastic terms we have included in the temperature equation with SALT 
cause a downwards-diffusion effect. Indeed, an additional SALT run (not shown), 
in which the stochastic terms were not included in the temperature advection, 
did not display this downwards diffusion behaviour. Thus, further investigation 
will be required in order to determine how to avoid the excessive downwards 
diffusion in the tracer equation while maintaining a positive effect on the EKE 
field. SFLT has very little effect on the temperature field when compared to that 
of the low-resolution model. This is expected however since there are no direct 
stochastic effects in the temperature equation. Comparing with SALT runs where 
the temperature downwards-diffusion effect is present against the SFLT runs, we 
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believe that the temperature is the dominant force for the evolution of velocity, at 
least at the resolutions we have considered here. Then, the limited effects on EKE 
by the SFLT framework are explained as it does not affect the driving temperature 
fields directly. It remains part of future work to consider the case where SFLT is 
added to the temperature field. 


5 Summary and Discussion 


This work lays the groundwork for the application of two relatively new stochastic 
parameterisation frameworks to the Primitive Equations. The first, SALT, has 
hitherto only been applied to simple idealised ocean models such as QG and 2D 
Euler. The second, SFLT, had not been investigated numerically prior to the present 
work. We have demonstrated some of the desirable theoretical properties of the 
stochastic Primitive Equations with the noise added in these ways. Notably, the 
preservation of a circulation theorem for SALT and energy conservation for SFLT. 
We have proposed to calculate the parameters £; governing SALT and @,; governing 
SFLT by two different methods: Lagrangian paths and Eulerian differences. We 
find that there are no significant differences between the two methods, either 
in the parameters themselves or in the results of model runs. In this case it is 
preferable to use the Eulerian differences method, as the parameters in this case 
are computationally less expensive to compute. However, we have used a set-up 
in which the fine-grid resolution is only is only 2 times the coarse-grid resolution. 
However, using a larger ratio of grid resolutions would mean more time-steps are 
needed in the Lagrangian paths and so may give different EOFs that differ more 
significantly than what we have observed here. We do observe, however, that there 
are large sensitivities to the choice of smoothing used in defining the coarse-grained 
velocity, from which the parameters are calculated. In the SALT case, the model 
runs using parameters calculated with a strong smoothing filter show a significant 
improvement in the eddy kinetic energy field at all depths, as well as in the eddy 
kinetic energy spectrum. In the SFLT case, the improvement in EKE field and 
EKE spectrum are more modest compared to the improvement by SALT due to the 
lack of direct stochastic effects to the driving temperature fields. Considering the 
temperature profile, however, we observe that SALT causes significant additional 
downward diffusion when compared with the deterministic model. It remains an 
open problem to devise a method to avoid this effect. The answer may lie in a 
different method for configuring the parameters &; or it may be the case that this is 
a property intrinsic to SALT. In either case, further study is needed in this direction. 

The stochastic parameterisation frameworks considered in this paper distils all 
uncertainties of the ocean models into the stochastic parameters £; and @;. However, 
the effects of these stochastic parameterisations could be limited by the model, 
both physically and numerically. Examples of the limiting factors for the Primitive 
Equations are the forcing from the temperature field and artificial viscosity imposed 
for numerical stability. The interplay between numerical effects such as artificial 
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viscosity and stochastic parameterisation is particularly interesting for future work. 
This is due to different numerical viscosity are imposed at different mesh resolutions 
to numerical stability which influences the calibration process. Thus, we expect 
there are limits to the effects of SALT and SFLT for low-resolution simulations 
where viscosity are dominant. In high-resolution simulations, we expect to see 
further effects of stochasticity as the influence of viscosity diminishes. After all, 
the problem of stochastic parameterisations are not just model-dependent, it also 
dependent on the numerical method solving it. 
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Appendix: Numerical Implementation 


In order to apply SALT and SFLT to FESOM2 we adapt the time-stepping scheme to 
include the appropriate stochastic terms. Details of the original (deterministic) time- 
stepping are given in [DSWJ16]. We modify the scheme from FESOM2 to a two- 
step Heun-type method [BBT04]; we choose this because of the use of Stratonovich 
integrals, to which the Heun method converges. The first step in the method is to 
compute the modified pressure: 


z zZ JE, AW, 
Sn biT” 2) dz / i yd?! atl 
Ph vos f ( Met Ba w"dz'— 
= 067 ng AB} yi a 
a Si. az ae At 


where Awi +1 and AB l +1 are independent, normally-distributed random variables 
with mean 0 and variance At. For the sake of conciseness we shall assume that the 
buoyancy depends only on temperature T, and that salinity is kept constant, how- 
ever, extending the method to include additional tracers should be straightforward. 
The advective, diffusive and pressure parts of the momentum right-hand-side are 
then computed: 


Aa! = RR"? — y (pf + n") At +D (u, Aa") (30) 
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where R”+!/2 is an Adams-Bashforth interpolation of the advective and Coriolis 
terms. In fact we have R’+!/2 = G +€) R" — G +e) R”-1, where R” = 

Rv" Ar, u"] + J; (R[é;,u"] — VPE, -u n) AWi — >», R[v" » AB; |] and 
R [v, u] := —V- (vu) — f x v. D includes the horizontal and vertical diffusion terms, 


as well as the external wind forcing. 
The change in free surface height Aj”+! is computed implicitly: 


0 0 
(1-sa°v. f voaz) art =v. f V- (u "4 Aa!) deat 
—H H 


j (31) 


Once this has been solved we can finally compute the stepped-forward horizontal 
velocity: 


at! =u" + Ad"! — garvaj (32) 


Then we solve for the total layer thickness h, which in the continuous case is the 
same as the free surface height 7; in the discrete case, however, they are different 
and we compute: 
A 0 
h” t?/2 = Artie =n van / at dzat 
-H 


In our present set-up we then set the free-surface height as a linear interpolation of 
the total layer heights: 


ĝt! = = ont 4} (1 — o)jh”+1/2 (33) 


where 0 € [0, 1] is an arbitrary parameter, which we set equal to 1. 
Since we have the horizontal velocity we may compute the vertical velocity: 
Zz 
oH =y. f amar (34) 
-H 
The newly-computed three-dimensional velocity, along with the stochastic SALT 


velocity, is then used to advect the tracer: 


prt3/2 — T”+1/2 _ Ry Eee T”-1/2 ent] py + £,AWn+3/2| +K [oe] 
(35) 


where Rr denotes the advection scheme and K is the diffusion. From these steps we 
compute intermediate values gnt! — (ar, Art h”t3/2, 79/?) from values at 
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the previous two time steps: u”, u”~!, h”+!/2, 7?+1/2, 7-1/2, We may write this 
schematically as: 


geet xn Fl x" x (36) 


where F is an operator representing the computations outlined above. For the 
corrector step we follow the same steps as above, to compute F bae x” | and 
we have the overall evolution given by: 


xrtl =x” + ; |F |x", +F et x"]| (37) 


This method differs from the usual Heun method because the right-hand side 
depends on the previous two time-steps, rather than just the previous one. It remains 
to prove that adding the stochasticity with this method does converge to the required 
Stratonovich integrals. 
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A Pathwise Parameterisation for A) 
Stochastic Transport ce | 


Oana Lang and Wei Pan 


Abstract In this work we set the stage for a new probabilistic pathwise approach to 
effectively calibrate a general class of stochastic nonlinear fluid dynamics models. 
We focus on a 2D Euler SALT equation, showing that the driving stochastic param- 
eter can be calibrated in an optimal way to match a set of given data. Moreover, we 
show that this model is robust with respect to the stochastic parameters. 


1 Introduction 


A fundamental challenge in observational sciences, such as weather forecasting and 
climate change predictions, is the modelling of uncertainty due, for example, to 
unknown or neglected physical effects, and incomplete information in both the data 
and the formulation of the theoretical models for prediction. Various dynamical 
parameterisation approaches have been proposed to tackle this challenge, see e.g. 
[6], [4], [11], [5], [1]. Of particular interest are the recently developed Data Driven 
models, that accommodate uncertainty by predicting both the expected future 
measurement values and their uncertainties, based on input from measurements 
and statistical analysis of the initial data. To effectively incorporate uncertainty 
in the data driven approach, such predictions are made in a probabilistic sense. 
Additionally, a data assimilation procedure is used to take into account the time 
integrated information obtained from the data being observed along the solution 
path during the forecast interval as “in flight corrections”. 

In the geoscience community, data assimilation (DA) refers to a set of method- 
ologies designed to efficiently combine past knowledge of a geophysical system (in 
the form of a numerical model) with new information about that system (in the form 
of observations). DA is a central component of Numerical Weather Prediction where 
it is used to improve forecasting by adjusting the model parameters and reducing the 
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uncertainties. To achieve this, a stochastic feedback loop between the model and the 
observation may be introduced: the assimilation of more data during the prediction 
interval will then decrease the uncertainty of the forecasts based on the initial data, 
by selecting the more likely paths as more observational data is collected. This is 
the basis of the so-called ensemble data assimilation which uses a set of model 
trajectories that are intermittently updated according to data. 

A key step for ensuring the successful application of the combined stochastic 
parameterisation and data assimilation procedure, is the “correct” calibration of 
stochastic model parameters. For Stochastic Advection by Lie Transport (SALT) 
and Location Uncertainty (LU) models, current numerical methods for calibration, 
see [4], [1], [5], [12], have largely been inspired by the physical interpretation of 
the models derivations. More specifically on the assumption that the flow map 
is decoupled into a slow scale mean part and a fast scale fluctuating part. In the 
references mentioned before, it was shown that these methods are effective and led 
to successful combination of data driven models and state of the art data assimilation 
techniques. 

In this work, we wish to investigate the feasibility and viability of probabilistic 
pathwise approach for calibration. Our general aim is to explore such ideas for 
a wide class of nonlinear stochastic transport models. This will be very useful 
in data assimilation problems, as in real world applications the signal is usually 
observed through discrete observations, but no results of this type for SALT or 
LU models have been obtained before. Currently, Lagrangian particle trajectories 
are simulated starting from each point on both the physical grid and its refined 
version, then the differences between the particle positions are used to calibrate the 
noise. This is computationally expensive and not fully justified from a theoretical 
perspective. In the same spirit as [3] but with a more complicated noise term and 
without any smoothing effects of a Laplacian, we propose an approach which uses 
high-frequency in time and low-frequency in space observations of a single path 
of the solution, to rigorously infer properties of the stochastic parameters. The 
knowledge of the noise is crucial for determining the behaviour of the solution and 
for assessing to what degree the solution of the coarse resolution SPDE deviates 
from the solution of the fine resolution PDE in the model reduction procedure, so an 
optimal calibration of the noise parameters is relevant from both a theoretical and 
an applied perspective. 

In this work we look at stochastic calibration for the two-dimensional incom- 
pressible Euler equation in vorticity form. This stochastic equation models the local 
rotation of a fluid flow in the presence of spatial uncertainties and it has been 
derived from fundamental principles in [6]. This equation is a key ingredient in 
modelling phenomena in oceanography and in order to ensure that it efficiently 
encodes the small-scale variability in the upper part of the ocean, one needs to 
specify the stochastic parameters based on real observations. One of the main issues 
in parameter estimation using real data is the fact that the model parameters do not 
map to observations in a unique way (model identifiability problem, see e.g. [2]). 
For this reason, we believe that a probabilistic approach is much more suitable. 
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The 2D Euler equation in the form derived in [6] and studied in [4], [5] and [8] 
is given by: 


[0,6] 
do + ur: Vordt + Y &i Va, o dW; =0 (1) 
i=l 


where u = (u!, u”) is the fluid velocity, @ = curl u = ð2uı — ðıu2 is the vorticity, 
(&;); are divergence-free time-independent vector fields such that 


CO 
X léi < 9° (2) 
i=l 


and (W')ien is a sequence of independent Brownian motions. Global well- 
posedness for Eq. (1) has been proven in [8] and the numerical and data assimilation 
perspective has been studied in [4] and [5]. In [8] the authors have shown that 
Eq. (1) admits a unique pathwise solution which belongs to the Sobolev space 
WE? (TŽ) (k > 2) when wy € W*?2(T?) and which can be extended to L% (T?) 
when wo € L®(T?). 


In this paper we consider the following SPDE on the two-dimensional torus T? = 
R*/Z?, driven by a 1-dimensional Brownian motion W: 


da; + ur - Va;,dt + E - Va; odW; = 0 (3) 


where u and w are as above and o denotes Stratonovich integration. We impose the 
following condition on the stochastic parameter &, in the same spirit as (2): 


lE llki. <0 (4) 
with k > 4. This condition ensures that for any f € W>? (T?) N W?-%(T?), 
lé- VAISS Cif 1E YE- YPI < CIFIB. (5) 


lé- YFIR < CIF lE- YE- YAIZ < CIF oo: (6) 


Remark I We can view the stochastic part as a space-time noise (£, W) where the 
spatial component is given by & and the time component is a standard Brownian 
motion. This perspective is many times useful in numerical applications where (& o 
dW,) - V is implemented as a random operator applied to the solution w. 


The problem of parameter estimation, known also as statistical inference, is 
technically challenging for such (infinite-dimensional) SPDEs driven by transport 
noise, as most methods used in the literature benefit from a diagonalizable structure 
of the underlying space-covariance matrices. This structure is specific for additive 
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noise and therefore it does not apply in our case. Also, most results are obtained 
for stochastic variations of the heat equation, which contain a smoothing Laplace 
operator (see for instance [3]). Our model does not contain a Laplacian a priori, and 
therefore we cannot exploit the properties of a heat kernel. These makes the analysis 
much harder. 


Contributions of the Paper 
In this work, we focus on Eq. (3) from two perspectives: 


e First, we show that the driving stochastic parameter € can be calibrated in an 
optimal way to match a set of high-frequency in time given data. This is done 
using a forced and damped version of the equation and a parametric form 
of the stream function and the corresponding stochastic parameter which is 
implemented using an orthonormal basis. Our technique can be explicitly applied 
to calibrate the 2D Euler model using real oceanic data and we intend to do this 
in coming work. 

e Second, we show that the original 2D Euler model is robust with respect to the 
stochastic parameters £ in the sense that if we consider two couples (w!, £!) and 
a’, E£?) which solve Eq. (3), then the L? distance between œ! and w* can be 
controlled using the initial conditions and the difference between £! and &* only 
(see Sect. 4). This is important in applications as it shows that if we consider 
approximate values for £, the corresponding model solution remains close to the 
true solution. 


Structure of the Paper 

In Sect. 2 below we present the problem formulation. In Sect. 3 we introduce the 
methodology. In Sect. 4 we prove the robustness of the original model and in Sect. 5 
we present the numerical results. 


2 Problem Formulation 


Let (Q, F, (F;)1>0, P) be a filtered probability space and W a one-dimensional 
Brownian motion adapted to the complete and right-continuous filtration (F;)+>0. 

Let h : R —> R be a smooth function representing some observation map. We 
assume we have available a finite sequence of high frequency in time snapshots 
of observed vorticity fields, that are denoted by h(w*),,(x) := A(w;)(x), i = 
1,..., N, and are adapted to (F;);>0. We take the view that the h(w*);,’s are the 
given observation data. We further assume that wy e WE? (TP), k > 4. 

Writing œs to denote solutions to the model (3) for a given vector field £, the 
generic problem we are interested in is to find a & so that solutions to (3) matches 
the data as best as possible, i.e. 
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arg wn lo“ — we || (7) 


for some suitable norm. ! 


The dimension of the observations currently coincides with the number of 
sources of noise, that is we have a determined system. However, in practice 
this is not always a realistic assumption and in future work we will look at 
underdetermined or overcomplete systems i.e. when the number of noise sources 
is larger than the dimension of the observation operator. 

In general, the infinite dimensional optimisation problem (7) may be too hard 
to solve in practice. We thus make concrete the form of &. Let (e;)jen be an 
orthonormal basis in L?(T?). We assume the following parametric form for the 
stream function of £, which is henceforth denoted by ¢, 


tœ) =) ajej, (8) 
j=l 
where œ j are reals. Then 
E(x) = Vita) =} aj Vte) (9) 


j=l 


and the optimisation problem (7) then reduces to finding the coefficients g j. 


3 Methodology 


For a stochastic process X; defined on a filtered probability space, its quadratic 
variation is defined by 


max j Atj;>0 


n 
[X] := lim Xn — X41, (10) 
i=l 


where fg = 0 < ti < -> < t = t is a partition of the interval [0,7], At; := 
|ti — t;-1|, and the convergence is in the sense of probability (see e.g. [7]). 
From (3) and (9) we have 


t t 
a =w- f B;(x; œw) as- f ) aj Vte; (x) - Var (x) o dW, (11) 
0 0 : 
j 


in which for notation simplicity, we have introduced B, (x; œ) := us (x) - Væs (x). 


1 By the assumed regularity of h, any solution to (7) is also a solution to arg ge |h(@*) —h(we) |]. 
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Using It6’s lemma, and following standard results on the quadratic variation of 
semimartingales, it is straightforward to show that 


oo t 
[Ay = >> aiaj f (h' (ws), Viei- Vos) (h'(@s), Vtej- Vos) ds. (12) 
i j=l 0 


Due to global existence and uniqueness of solutions to (3), [h(w)]; exists 
globally P-almost surely. Thus the right hand side of (23) can be arbitrarily well 
approximated by its truncation for all t i.e. for a given € > 0, there exists Me such 
that 


Me i 
hol- Y aia, f WE), VEe Vos) (Wo), Ve; Vos) ds} < e. 
E 0 
i,j=1 
(13) 
Additionally, from the computational perspective, for any fixed Me, the linear map 


t 
Aij = f (h' (ws), Vie; Vos) (h! (os), Ve; Vos) ds (14) 
0 


that defines the truncated quadratic form is symmetric and positive definite,” and 
thus can be diagonalised by a unitary linear map. Doing so, we obtain the following 
linear problem 


Me 
[A(w)l = So a5aj +e, (15) 
j=l 


where ¢’ denotes the truncation error of (23), À j are the eigenvalues of the associated 
linear map, and &;’s are the original œ values which get rescaled by the unitary 
matrix from the diagonalisation. 

We can estimate [h(w)]; using the high frequency in time data h(@*) and (10), 
assuming the discrete sum converges fast enough, 


N 
[hole © hw = Yo), — hs? (16) 


i=l 


The estimate [h(w)] 1,n could then be used in (15) to get an estimate for the a. 
One could then recover the original w’s by applying the unitary linear map that’s 
associated with the diagonalisation of Aj;. 


2 Since [h(w)]; is strictly positive. 
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Example 1 Let h be the identity map. Let e, = e'“* be the Fourier basis. Then we 
have 


sand t 
folv= >. wiaj | eet Vo) (KF + Vos) ds. (17) 
i,j=l 
with K;,Kj¢Z7 


In Sect. 5 we test numerically Eq. (17) for an idealised example, and show we can 
adequately recover the basis coefficients using our methodology. 


Example 2 In this example, we assume the data are the kinetic energy of the flow, 


1 2 
E :=- | |u,l?dx. (18) 
2 T2 


Thus the data are “indirect” information about the vorticity. Note that the energy 
data is feasible for SALT models as energy is not a conserved quantity of SALT. 

Below, we avoid calculating the pressure term of the Euler system by utilising the 
Biot-Savart operator K that links the velocity field to the vorticity field in Eq. (3). 
For further discussions on this topic see [9] or [10]. We have 


u(x) = (K #0)(x) | Kex—yowdy (19) 
where 
‘il 
ka= Yo yee. (20) 
T 
KeZ2\{0} 


It is known that, for any k > 0, there exists a constant Cz,2, that is independent of 
u, and such that 


llullk+1,2 < Cx2llolle,2- 
Ify : T? x [0, œ) > R is a solution for Ay = —@ then u = V+y solves 
æ = curl u, sou = —V+A7!«o. The reconstruction of u from @ is ensured by 


the incompressibility condition V - u = 0 and a periodic, distributional solution of 
Ay = —a is given by 


W(x) = (G x @)(x) 


where G is the Green’s function of the operator — A on T? 
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A 
eoe T 


KeZ2\{0} 
and k = (K1, K2), Kt = (K2, —K1). 


Combining (11) with the Biot-Savart law (19) we obtain 


t t 
ww=ww- | | Ka-yBgodyds- f f KE- yE): Vody oadw, 
0 JT 0 JT? 
(21) 
Using Itô’s lemma, we obtain 


t 


t 
E, — Ep = -f (us, K x (Bs — x -V(E-Vas)))ds -f (us, K * (E - Vas)) dWs 
0 0 


(22) 
where (-, -) is the standard L? (T?) pairing. Thus 
oS t 
[E]; = 5 diaj / (us, K x (Vte; : Væs)) (us, K x (Vte; - Vas)) ds. 
Pg 0 
i,j=l 
(23) 


4 Robustness 


Theorem 2 Let w!, œ be two solutions of the 2D Euler equation (3) and £t, & the 
corresponding stochastic parameters for each of these two solutions. More precisely, 
lof, El) for £ = 1,2 solves 


1 
dof +u : Vofdt +$" - VofdW, = 56°- V G ; Vor) (24) 
Then for any p > 2 there exist some constants? C = C(p,T), C1, p, C2, p, such that 
2p 


= 2 2 2 
e| sup e "ho ai < Cpr (lod - op llg? +E! — 6715? + 1E! — £715) 


te[0,T] 
(25) 
where 


3 In this theorem all constants generically denoted by C, C p.T>Ci,p,C2,p, C may differ from line 
to line and from term to term. 
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t 
y(t) = Ge} lor lg. dr F C2, pt 
0 
andk > 4. 
Proof of Theorem 2 Let ò := œ! —w?, i = u! —u?, E = £! —E?. Then ð satisfies 
dy + (iy Vol +u? - Vöpdt + (£! - Yo} — & - Vo?) dW; 
Ly 1 1 2 2 2 
=5(¢ . VE! Vol) — £2. VE Vo?) dt, 
2 
By the It6 formula: 
dlll = — 2(@r, £! - Vol — £? - Vo?) dW; — (ör, üt - Vol +u? - Võr)dt 
+ ((ar,€!- VE! - Vo}) — - VE? - Vop) (26) 
+E! Vol — E? . Vor, E! -Vol — £2. Vo?) dt. 
We make the following notations 
m = lö Z := IEI L:= IET 
Ar (= —2(@1, ür Vol +u? - Vax) + (ör, £! - VE! - Vol) — £? - VE? - Vo?) 


+ (E! . Vol — 87. Veo? £! . Væl — £? + Vo?) 


D, := fo El. Væl — £? - Vond Ws 
P(t) := Clo lia +C 
W(t) := (Cllol lz, +C)Z + Clo} 2.9L 
Z := Clot lZ 
Then we can write (26) as 
dm, = A;dt — 2dD; 


We want to estimate each of the terms which appear in (26). The difference of the 
nonlinear terms is analysed explicitly in [8] pp. 9: 


= = 1 = = 1 = Teieal 12 
(ær, Up» Væ) < lærlzlur lal Vo; lla < Chollo, llk2 = Clo; llk 2m 


168 O. Lang and W. Pan 


We used here that || Vo} jg < Cllo}lle2 and llā;ll4 < Clit lli.2 < Cllarll2. Also, 
since u? is divergence-free, (@;, u? - Vor) = -5 (V. u) (©) dx = 0. We 
T2 


estimate the difference terms which include £! and £? in Lemma 3 below. Note here 
that the term (&;, £7- V (£ Bs Va)) is negative. Using these estimates and Lemma 3 
below we have that 


Ardt < w(t)dt + d(t)mrdt. 


Then 


t t 
-f d(s)ds -f d(s)ds 
dje /0 m, | =e 0 (dm; — $(t)m;dt) 


t 


t 
-f d(s)ds 
<e J0 (w(t)dt — 2d D;). 


After raising everything to the power p > 2,7 taking the supremum over t € [0, T] 
and then the expectation, we obtain 


t P y 

= ġ(s)ds t -f o(r)dr 

E| sup Je /0 mı < Cpm +Cp sup if e J0 y(s)ds 
te[0,T] telo, T] |/0 


s P 
t -f ġ(r)dr 
+CpE | sup fe 0 dD; 
0 


te[0,T] 

(27) 

For the stochastic integral we use the Burkholder-Davis-Gundy inequality: for 
arbitrary p > 2 and a martingale M, there exists a constant C,, such that’ 


| sup od < CE [IMF] 


te[0,T] 


where [M], is the quadratic variation of the martingale M,. In our case 


4 We use here and below that |a + b|? < 2?! (ja|? + |b|”), p > 2. 
5 In this proof C, C p are generic constants which may differ from line to line and from term to 
term. 
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t -f omar 
M; a) e 40 dD 


0 


S 


and then 


t 22 f gar t = f oar 
[M]; = f e Jo d{D]; = f e Jo Kös, E! - Vol — £?’ - Vos) ds. 
0 0 


Therefore? 


p/2 


rf d@ar 
[tmp] =E fe (ðs, E! - Væl — £? - Vo?) |2ds 


0 


IA 


Ss 
T -f o(r)dr : i E 5 
Cp, TE f e J0 (ðs, E - Va, — €° - Vos) |? ds 
0 


; 
r f oad, 
< Cpr f è| sup e 0 (m? +2”) ds. 
0 


re[0,s] 


Using these estimates in (27) we obtain 


t P s 

-f ġ(s)ds t -f o(r)dr 

E| sup Je /0 mı < Cpm} + Cp,rE sup f e 0 p(s)? ds 
te[0,T] te[0,T] J0 


P 
T -f ġ(q)dq Jo os 
+Cor f sup e 0 (mi +2?) ds 
0 re[0,s] 


(28) 
For the second term on the right hand side of (28) we use that, since Z is 
deterministic and by [8] the 2D Euler equation (3) has a unique global solution 
in W*-2(T?) for k > 2, there exist Č}, CF such that for all t € [0, T] 


6 We use here the control obtained for Q in Lemma 3. More precisely: since Q < Cm; + Z then 
QP < Cpm? + ŽP). 
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Pp 
| sup Hor = | sup (Clos lk + OZ + Clos le 2L) l 


se[0,t] 


< ZPE l sup (Clæl lZ 2+ or | + L?E l sup (Cla, "| 
se[0,t] se[0,t] 


rl yp 27 p 
< CpZ” + CL”. 


r 
7 -f oad, 
The same argument is used to control | 4} sup e 0 ZP | ds in the 
0 re[0,s] 


third term of (28). Then 


t P 
-f ġ(s)ds i a 
E| sup Je 40 mt < Cp, rno + Z? + LP) 
oe 


T n 
r p 
+c? f E| sup (e7/4@44m,)° | ds. 
p.T 0 | ( r) 


re[0,s] 


Then by Gronwall lemma 


t P T 

-f o(s)ds f C? rds T 

E| sup Je /0 my < e/0 (me + f C3 rong + ZP + L? ds) 
te[0,T] 0 


<e (mp + TCC} r0nb + ZP + LP). 


So we finally obtain that 


m = 1 F2 
| sup e YO! — ow | 
te[0,7] 


2 2 2 
< Cpr (lo — 913? + 116! — 6713? +6" 7115). p= 2 


where 


t 
yt) := pf ġ(r)dr. 
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Lemma 3 Let (œl, é!) and (w?, E?) be two solutions of the 2D Euler equation 


with œ; := o} E w? and — := £! — &?. Then there exist constants C’ such that the 


following estimates hold: 


Q := [ön E! -Yol — E- Yop) < Cll@rll3 + Clo lk allEl3- 
A := (El. Vol — E°. Var, E! -Vol — E°- Var) < Clai + Clo Ikoll 
|B] < (Clot lk 2 + Ola + CIEI + Cloi lk allElii.2 
where 
B := (&, £! -V G ; Vor ) ~2£2.V G ; Vo? )). 


and k > 4. 
Proof For the difference terms which include £! and £° we use that 


El. Væl 8 Va? = E Var +£? - Vä. 
We have 
Q = | (č, £! - Vor — E°- Var)| 
< |(w} — @, E! — E°) - Vo})| + Kol — o7, £- V(@} — @?))| 


1 1 
zlo; — e713 + FIV; Moll! — £13 


1 _ C = 
zl + Fler klé 


IA 


with k > 3, since the second scalar product is zero due to the fact that V - £ 2 = 0. 
Also 


A= (E! . Vol — £7. Vo, é! Væl — E°- Va?) = IE! - Vol — £- Voli 
< IE! -£3 - Voll? + l£? - Vo} - oD? 
< lE! — E (3 Veo Z + Clo! — o3 
< Clot lk lEI + ClaN? 


where k > 3. For the higher order term we have 


7 C differs from line to line and from term to term depending on the Sobolev embedding we use. 
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B= (ol — of, €!-V (E! - Vo!) - V (E? Va?) 
= (wy — wf E! — E°): VE! Voor) 
+ (a! =op, & V (E — £7) Vol )) 
+ (w} — 07, E? V (E Vw} — o) 
=:at+b+c. 
Note that c is negative: 
(co} — wf, V (E? Vw} — @?))) = -E° Vol — 07), & Vlo = op) 


= —|2 - Vio} — o2 


<0 
so |B| < |a| + |b|. We estimate |a| as follows: 
1 1 
la| = (o! — œf, E! — &)- VE! - Vol) < sliver Vo, lZ lol — @7 115 + zlé’ -° 


1 
lol = oF 13 + SI -°l 


IA 


142 
Flo 


Co ie iat a e 
yle lk 2ll& lla + zll 


with k > 4. Likewise, we estimate |b|: 


1 1 
Ibl = Kol = 07,6? -V (E = £)- Vo!) 1 < lho — 713 + 516? Y (6-6 - Vor!) 1B 


1 1 242 1 
= zler a; ll 4 aK. 


Now 
K < JE: VE! — &7)- Volli + lE El- E -VYD = Ki + Ko 


where 


Ky < lE? - Vol ZV! — EDI 
< Cloi Ii lE! — ETa 


TOs att ae 5 
< Clo, lkoll — El 2 


and 
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2 ly y2ye1 242 
K2 < CIES- VV, ialle” — Ella 
2 1) 2 1 242 
< CWE” VV, It alle — F°llt2 
12 1 22 
< Clo; Mlk 2l — Fll1.2 
for k > 4. Then 
142 E2 
K < 2C|lo; llk allEll4.2 


and therefore 
1 - 
lol < 5 lö? + Clos Ik 2E. 


which gives 


C 1\ l1- = 
e| < (Sliotla + 5) hand + 51813 + Clo} 2 AE. 


5 Numerical Results 


In this section, we show the results we obtained for Example 1 in Sect.3. We 
implemented the main equation (3) with added forcing and damping, on a unit 
square domain with doubly periodic boundary conditions, 


da, +,» Va;,dt +£ - Va; o dW; = (Q —ra,)dt (29) 


where we chose r = 0.001 and Q(x) = 0.01(cos(8zy) + sin(87x)). Note that, 
since the added forcing term is of bounded variation, (17) is unchanged for (29). 

We considered a € whose parametric form with respect to the Fourier basis 
consists of only one a. The stream function of our chosen & is given by 


C(x, y) = a (cos(kj 27x) cos(k227 y) — sin(k, 27x) sin(k227y)). (30) 
Note that 
c = O (e27 + ee) (31) 


and 


E = ian (et~ = eTA, (32) 
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Vorticity w 
-9.0e-01 0.6 -0.4 -0.2 O 02 04 06 9.0e-01 
| 


Fig. 1 Snapshots of the numerical solution w(t, x) to (29) at times t = 0 (left), and t = 1 (right) 


To discretise (29), we followed the methods documented in [4]—a mixed Finite 
Element method was used for the spatial derivatives, and an explicit strong stability 
preserving Runge-Kutta scheme of order 3 was used for the time derivative. We 
added the forcing and damping terms to help with maintaining the statistical 
homogeneity of the numerical solution, once it has reached a spun-up state from 
some set initial state. Our choice for the set initial state was 


(0, x, y) = sin(87x) sin(87 y) + 0.4 cos(6zx) cos(6x y) 


+ 0.3 cos(107x) cos(4ry) + 0.02 sin(2zy) + 0.02 sin(27x). 
(33) 

Spatially, we chose the grid size 64 x 64 cells. We first spun-up the system until it 
reached a statistical equilibrium state. This statistical equilibrium state was then set 
as the initial condition for our experiment. Figure | shows a snapshot of the obtained 
initial condition. Over the spin-up phase, we used a = 0.000001 and AT = (2, 4). 

The time horizon for the experiment data was chosen to be the unit interval, i.e. 
we generated data w*(t;,x) forO = tọ < t < --- < ty = l. See Fig. 1 for 
snapshots of w*(0, x) and w*(1, x). When generating the data, we used the larger 
value of a = 0.001. This was to avoid any possible numerical issues® when we 
attempted to recover a from data. 

Assuming we know in-advance the exact Fourier wavenumber k, the linear 
system for estimation reduces to 


8 When g is small, œ? is close to machine precision. 
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[o] n) = er (x) — wr (X)? © a4” BY, k, x)e (x) (34) 
i=l 
where 
B(t, k, x) := f ‘act - Vans (x))?ds (35) 
and 
el (x, y) = (cos(k127 x) sin(k227r y) + sin(k127r x) cos(ko2zy))? . (36) 


Thus our estimate for œ is given by 


a2 _ 1 _ Je Wn dx 
N- 4r? f B(M, k, xe, dx" 


(37) 


Remark 4 In (37), we applied spatial averaging to stabilise estimation. 


Remark 5 The assumption that we know k in advance is of course too strong from 
the applications viewpoint. The aim of this experiment is to test the strength of 
the pathwise approach under the assumption of “perfect knowledge”. If we cannot 
accurately recover œ in this case, then getting a good estimate for œ using the 
pathwise approach may be too difficult or impractical in more realistic scenarios. 


~ 


Figure 2 shows snapshots of [œw]; y (X) and B(t, k, x) e (x). We applied (37) for 
different values of N. In each case, the time integral that constitutes B (t, k, x) was 
approximated using a simple trapezoidal rule, for which the same N number of data 
snapshots were used. Figure 3 shows the results for the relative error 


dig a (38) 
a 
for the different values of N. The results show that, in the worst case of N = 2500, 
the relative error was no greater than 0.89. This translates to an absolute error 
of range of 0.001 + 0.00089. The best case was when all 200,000 data samples 
were used to estimate a, the relative error in that case was 0.00135. This suggests 
convergence and stabilisation of the sum for Ío] A 
For future work, we aim to test the pathwise approach for cases in which we do 
not know the exact selection of basis elements for £. Further, we wish to extend and 
test these ideas on coarse grained PDE data and compare with the results that were 
obtained in [4] using previously developed calibration methods. 
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[w]e Bi(k, x)e'k 
2.5e07 0.1 0.15 0.2 0.25 03 3.7e-01 0.0e+00 3000 5000 7000 1.0e+04 


Fig. 2 Shown on the left is a snapshot of the estimate fol, which was computed using 
N = 200,000 data samples. Shown on the right is a snapshot of the basis element 
B,(k, x) (cos(kj 27x) sin(k227y) + sin(kj 27x) cos(k22zy))*, which was approximated using the 


same N number of data samples 
10° 
| Si ee 
Tag 
He 
>x 
107? | 
Z 
10 | 
nnig -ETTOr 
O(N-"/2) ; 
x 
10-3 j 
08 10° 
N 


Fig. 3 The plot (in loglog scale) shows the relative error erry defined in (38) as a 
function of N. erry was computed for N = 2500, 5000, 10,000, 20,000, 40,000, 50,000, 
66,667, 100,000, 200,000 


A Pathwise Parameterisation for Stochastic Transport 177 


Acknowledgments The authors would like to thank Prof Dan Crisan for the many helpful 
suggestions and constructive ideas he shared with them during the preparation of this work. They 
also thank Prof Darryl Holm, Prof Bertrand Chapron, Prof Etienne Mémin, and the whole STUOD 
team for many inspiring discussions they had during the STUOD meetings. 


Funding 

Both authors were partially supported by the European Research Council (ERC) under the 
European Union’s Horizon 2020 Research and Innovation Programme (ERC, Grant Agreement 
No 856408). 


Appendix 


Lemma 6 (Gronwall Lemma) Let 6 : [0,7] — [0,00) be a non-negative 
absolutely continuous function that satisfies for a.e. t 


aB(t) < P(t) B(t)dt + w(t)dt 


where h, y are non-negative integrable functions on [0, T]. Then 


t 
f eas t 
p(t) < edo (40+ f voas) 
0 


for allt € [0, T]. 
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Stochastic Parameterization with A 
Dynamic Mode Decomposition ce | 


Long Li, Etienne Mémin, and Gilles Tissot 


Abstract A physical stochastic parameterization is adopted in this work to account 
for the effects of the unresolved small-scale on the large-scale flow dynamics. This 
random model is based on a stochastic transport principle, which ensures a strong 
energy conservation. The dynamic mode decomposition (DMD) is performed on 
high-resolution data to learn a basis of the unresolved velocity field, on which 
the stochastic transport velocity is expressed. Time-harmonic property of DMD 
modes allows us to perform a clean separation between time-differentiable and time- 
decorrelated components. Such random scheme is assessed on a quasi-geostrophic 
(QG) model. 


Keywords Stochastic parameterization - Dynamical system - Data-driven 


1 Introduction 


The modelling under location uncertainty (LU) setting has shown to provide 
consistent physical representations of fluid dynamics [10, 12]. This representation 
introduces a random component to describe the unresolved flow components. 
This enables to consider less dissipative systems than the classical large-scale 
counterparts. Nevertheless, the ability of such a model to represent faithfully the 
uncertainties associated to the actual unresolved small scales highly depends on 
the definition of the random component and on its evolution along time. Unsur- 
prisingly, stationarity/time-varying and homogeneity/inhomogeneity characteristics 
have strong influences on the results [1, 2]. Another important aspect concerns the 
ability to include in the noise representation a stationary drift component associated 
to the temporal mean of the high-resolution fluctuations. As shown in this paper such 
stationary drift can be elegantly introduced in the noise through Girsanov theorem. 
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Yet, large-scale persistent components associated to the high resolution fluctuations 
are not strictly stationary and slowly varying quasi-periodic components might be 
important to include. To that purpose we devise a noise generation scheme relying 
on the dynamic mode decomposition [13]. Such a decomposition or other related 
techniques aiming to provide a spectral representation of the Koopman operator [11] 
will allow us to represent the noise as a superposition of random and deterministic 
harmonics oscillators. The first ones are attached to the fast components whereas the 
latter represent the slow fluctuations components. As demonstrated in Sect. 4, this 
strategy brings us a very efficient technique for ocean double-gyres configuration. 


2 Modelling Under Location Uncertainty 


In this section, we briefly review the LU setting and the associated random QG 
model that will be used for the numerical evaluations. 


2.1 Stochastic Flow 


The evolution of Lagrangian particle trajectory (X+) under LU is described by the 
following stochastic differential equation (SDE): 


dX, Œ) = v(X;(x), t) dt +o(Xi(x),t)dB,, Xo(x) =x €D, (1) 


where v denotes the time-smooth resolved velocity that is both spatially and tem- 
porally correlated, od B,; stands for the fast oscillating unresolved flow component 
(also called noise in the following) that is only correlated in space, and D c C? 
(d = 2 or 3) is a bounded spatial domain. 

We now give the mathematical definitions of the noise. In the following, let 
us fix a finite time T < oo and the Hilbert space H = (L?(D))@ with the 
inner product (f,g), = SF DE) dx and the norm || fla = (f, ra 
where e7 stands for transpose-conjugate operation. Then, {B;}<;<r is an H-valued 
cylindrical Brownian motion (see definition in [4]) on a filtered probability space 
(2, F, {Ft}o<c<r, P), with the covariance operator diag(Iz) (where Iq is an d- 
dimensional vector of identity operators). For each (w, t) € 2 x[0, T] constraining, 
o (+, t)[e] to be a (random) Hilbert-Schmidt integral operator on H with a bounded 
matrix kernel o = (6;;);, j=1,...,d Such that 


pia ay 


o(x,t) f= [eesoro dy, feH, xeD. (2a) 


Its adjoint operator o*(-, t)[e] satisfying (0 (+, t) f, g), = (f,0*(-, 1g) , reads: 
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o*(x,t)g = f že y.Ðgo)dy, geH, xeD. (2b) 


The composite operator ø (+, t)o*(-, t)[] is trace class on H and admits eigenfunc- 
tions &,,(-, £) with eigenvalues 4,,(t) satisfying ` en Àn (t) < +00. The noise can 
then be equally defined by the spectral decomposition: 
1/2 
o (x, t) AB, = YO an! DEn E, 1) dpn (0), (3) 


neN 


where 8, are independent standard Brownian motions. In addition, we assume that 
the operator-space-valued process {ø (+, f)[e]}o<;<7 is stochastically integrable, 
Le. PL fy ae An(t)dt < +00] = 1. From [4], the stochastic integral 
{ hol, S)dBys}o<;<7 is a continuous square integrable H-valued martingale, 


hence a centered Gaussian process, Ep[ h o (+, s)dBs] = 0, of bounded variance, 


mall i o(-,s) dB, 7] < +00. Moreover, the joint quadratic variation process of 
the noise, evaluated at the same point x € D, is given by 


‘ + t 
(f o(x.s)dB., | o(x, s)dB,) =f a(x, s) ds (4a) 
0 0 t 0 
an= | ča, y, Dë x Ddy = E mOn. (4b) 
neN 


We remark that real-valued noise can be achieved by adding the constraint that both 
eigenfunctions, eigenvalues and the standard Brownian motions in (3) are organised 
in complex-conjugated pairs. In that case, its joint quadratic variation process is 
real-valued as well. 

The previous formulations consist of only a zero-mean and temporally uncor- 
related noise. However, this might not be enough and including a mean or 
time-correlated component of the unresolved velocity field could be of crucial 
importance to obtain a relevant model. For instance, the eddy parametrization 
proposed by [15] is decomposed into a deterministic mean term and a stochastic 
term of zero-mean. For the double-gyre circulation configuration, the considered 
deterministic parametrization allows to reproduce the eastwards jet for the coarse- 
resolution model, while the additional stochastic terms enhance the gyres circulation 
and improves the flow variability. Similarly, the random-forcing model proposed 
by [3] consists in a space-time correlated stochastic process to enhance the jet 
extension. The slow modes of the sub-grid scales can be provided by adequate high- 
pass filtering of high-resolution data on the coarse grid. We aim in this work at 
investigating the incorporation of such slow components within the LU framework. 
However, the derivation of LU models [10, 12, 1] relies on the martingale properties 
of the centered noise and we need hence to properly handle non centred Brownian 
terms. The Girsanov transformation [4] provides a theoretical tool that fully 
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warrants such a superposition: by a change of the probability measure, the composed 
noise can be centered with respect to a new probability measure while the additional 
drift term appears, which pulls back time-correlated sub-grid-scale components into 
the dynamical system. The associated mathematical description is given as follows. 
Let I, be an H-valued F;-predictable process satisfying the Novikov condition, 
p| exp h ITA2 dt)] < +00, then the process (B, := B+ fo I’, ds}o<<r is an 
H-valued cylindrical Wiener process on (2, F, {F;}o<:<r, P) with Radon-Nikodym 
derivative 


D 


d T 1/7 
2 
T = exp(- f (Ti, dBi)y — sf Ir.i, ar). (5a) 


In this case, the SDE (1) under the probability measure P reads: 


dX; = (v(X;, t) — o (Xp, 1) dt + o (X;, t) dB, (5b) 


In the present work, we shall consider rather this modified stochastic flow defined on 
(2, F,{Firhocrer, P) with E slodğ,] = = 0 as the physical solution. Hereafter, ø T; is 
referred to as the Girsanov drift. 


2.2 Stochastic QG Model 


The evolution law of a random tracer (function) © transported along the stochastic 
flow, ©(X;45;,t + ôt) = O(X;,, t), is derived by [10, 1]. Under the probability 
measure P, this can be described by the following stochastic partial differential 
equation (SPDE), namely 


~ 1 
10 := dO + @* dt +odB,)-VO — zy @VO)d =0 (6a) 
z 1 : 
v a ae RT (V-o)—oT, (6b) 


In this SPDE, the first term d;,O(x) := O(x,t + ôt) — O(x,t) stands for the 
(forward) increment of © at a fixed point x x € D; the second term describes 
the tracer’s advection by an effective drift 0* and the noise odB;; the last term 
depicts the tracer’s diffusion through the noise quadratic variation a. The effective 
drift (6b) ensues from (i) the noise inhomogeneity, (ii) the possible unresolved 
flow divergence and (iii) the statistical correction due to the change of probability 
measures, respectively. 

The derivation of the stochastic geophysical models under the LU framework 
follows exactly the same path as the deterministic derivation, together with a proper 
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scaling of the noise and its amplitude. In particular, a continuously stratified QG 
model under LU has been derived by [12, 9] using an asymptotic approach. With 
horizontally moderate and vertically weak noises (see definitions in [12, 9]), the 
governing equations under the probability measure P read: 


Evolution of potential vorticity (PV): 


a ret tt , 
g= ie Y dr + (0d), u) — (5V (Bka Vui) + Baxain) dt, Oa) 


From PV to streamfunction: 


2 
Vy +0-(22 av) = 4-A (7) 


Incompressible constraints: 


u=Vtw, V-odB, =V-(a* —u)=0. (7c) 


Here, V = [0,, dy]7, V+ = [—d,, 3x]", V? = 02, + a denote two-dimensional 
operators and J( f, g) = 0, fdyg — 0,gd, f stands for the Jacobian operator. The 
vector fields u, odB, and the tensor field a are two-dimensional (2D) horizontal 
quantities. The horizontal effective drift is defined as &* := u — V - (a/2) — oT. 
The scalar fields q and y represent the PV and the streamfunction. In Eq. (7b), 
N? = —(g/p0)0-p is the Brunt- Vaisala (or buoyancy) frequency with g the gravity 
value, po the background density, p the density anomaly, and fp + fy is the Coriolis 
parameter under a beta-plane approximation. As shown in [1], one important 
characteristic of the random model (7) is that it conserves the total energy of the 
resolved flow (under natural boundary condition) for any realization (i.e. pathwise). 
This property highlights a strong relation between the classical deterministic model 
and the stochastic formulation. 


3 Numerical Parameterization of Unresolved Flow 


Data-driven approaches are presented in this section to estimate the spatial corre- 
lation functions of the unresolved flow component based on the spectral decom- 
position (3). In practice, we work with a finite set of functions to represent the 
small-scale Eulerian velocity fluctuations rather than with the Lagrangian particles 
trajectory. We first review the empirical orthogonal functions (EOF) method for 
which the noise covariance is assumed quasi-stationary. We then propose an 
approach relying on the dynamic mode decomposition (DMD) to account for the 
temporal behavior of the spatial correlations. 
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3.1 EOF-Based Method 


In the following, let {WyR(X, tj) }i=1,...,y be the set of velocity snapshots provided 
by a high-resolution (HR) simulation. We first build the spatial local fluctuations 
u ¢ (x, ti) of each snapshot on the coarse-grid points. In particular, for the QG system 
(7), one can first perform a high-pass filtering with a 2D Gaussian convolution 
kernel G on each HR streamfunction Wyr, to obtain the streamfunction fluctuations, 
wrx, ti) = (a — G) x Yir) (x, ti) (only for the coarse-grid points x). Then, the 
geostrophic velocity fluctuations can be derived by uf = V aves We next centre 
the data set by u’ f = Ufa uf" (with æ the temporal mean) and perform the 


EOF procedure [9] to get a set of orthogonal temporal modes {oy}:=1,....y and 
orthonormal spatial modes {@,,.}m=1,...,n Satisfying 
N 
u's x, ti) = > Om (£1) Pm (X), Onn’ = Àmôm,n- (8) 


m=1 


Truncating the modes (with M < N) and rescaling by a small-scale decorrelation 
time T, the stationary noise and its quadratic variation can be build by 


M M 
o(x)dB, = Vt > Van bn(€) dBm), a= tY Amn pE). O) 


mal m=1 


Note that this time scale t is used to match the fact that the noise in (5b) has 
the physical dimension of a length. In practice, we often consider the coarse-grid 
simulation timestep Af,r. In addition, the Girsanov drift is set to be o(x) IF; = 
TF (x). It means that the Girsanov drift here is the projection of the temporal 


mean of the sub-grid scales onto the EOFs, i.e. o(x) Ty = Ti YmỌm(x) with 
Ym = (UF, bm) u satisfying ys yZ < +00. 


3.2 DMD-Based Method 


The DMD algorithm [13] seeks a spectral decomposition of the best-fit linear 
operator A that relates the two snapshots: 


u’ (tiy) © Au'p (2, ti). (10a) 


Applying the exact DMD procedure proposed by [14], the corresponding spectral 
expansion in continuous time reads 
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N 
u's (x, t) = È bm exp (Om + i@m)t) Om (X), (10b) 


m=1 


where @,, (x) € C are the DMD modes (eigenvectors of A) associated to the DMD 
eigenvalues um E€ C, om = log(|Mm|)/Ata € R are the modes growth rate (with 
At; = tj+1 — ti the sampling step of data), wm = arg(Um)/Ats € R are the modes 
frequencies (with i the imaginary unit) and bm € C are the modes amplitudes. In 
practice, our data set of velocity fluctuations is real valued, hence the DMD modes 
(also eigenvalues and amplitudes) are two-by-two complex conjugates, i.e. 93, = 
Paps (P = 1,..-,.N/2). 

We next propose to split the total set of DMD modes into two subsets, M° and 
M”, to select separately adequate fast and slow modes for the noise (from M”) and 
the Girsanov drift (from M°), respectively, according to the following analysis of 
frequencies and amplitudes: 


IT 
M = [m e [1, N] | lum! 1, oml < Š, Wom 2 Ch (11a) 


C 


n 7T 
M’ = [m € [1N] | lel = 1, lon] > Z, balach (11b) 
C 


where Te is a temporal-separation-scale that can be estimated by the spatial mean 
of the autocorrelation functions of data and C denotes an empirical cutoff of ampli- 
tudes. The DMD modes that are neither included in M“ nor in M” are discarded. An 
example of spectrum and amplitudes of the selected DMD modes is shown in Fig. 1. 
In order to avoid spurious effects associated with the non-orthogonality of DMD 
modes, their amplitudes are rescaled such that the reconstructed data corresponds to 


1.04 
1.00 4 eo E lane + + correlated (M°) + 
Pai j aa + uncorrelated (M’) 
0.75 4 ; n à 
+ + 
. kA 0.8 4 
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id a 067 + + 
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Fig. 1 Illustration of the selections of DMD modes used for the noise (orange) and the Girsanov 
drift (blue) 
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an orthogonal projection onto the subspace spanned by the modes in M° or M”. In 
particular, we propose to rescale those truncated DMD modes as follows: 


Gi) Construct the Gramian G = (8m,n)m neme With 8m,n = (Om Pn) ni 
(ii) Inverse the Gramian G! := (oat me and derive the dual set of the 
truncated DMD modes by 9%, = Ð peme SmnPn: 
(iii) Project the initial state of data on the dual set of modes to update the 
amplitudes: ¢,, := (u'r (+, t1), On) H Pm- 
Such procedure holds separately for the DMD modes of M° and M”. Finally, 
the noise and the correction drift can be defined as 


a(x, DdB, = Vt XO expliamt)dn(*) dBm (0), (12a) 
meM*" 
a(x, I, = Up (x) + D> expli@mt) dyn (x), (12b) 
meMe 


In particular, we assume that each pair of the complex Brownian motions are 
conjugates (2, = 8,, ,) and their real and imaginary parts are independent. As 


such, both noise od B, and correction drift ø l, are real-valued fields. In addition, 
the joint quadratic variation of such noise remains stationary: 


a(x)=t D> onoho). (12c) 
meM” 


In a similar way as in the EOF-based method, we could also construct the Girsanov 
drift by the projection of the RHS of (12b) onto the DMD modes. As we have 
dropped the unstable DMD modes, one can show that the predictability and the 
Novikov condition (presented in Sect. 2) of F hold in this case. 


4 Numerical Experiments 


In this section, we present some numerical results of the stochastic QG system (7). 
The objective consists to improve the variability of large-scale models defined on 
coarse grids. To that end, a high-resolution deterministic reference model (REF) is 
first simulated and compared to several coarse-resolution models: the benchmark 
deterministic model (DET), two stochastic models with an EOF-based noise (STO- 
EOF) and a DMD-based noise (STO-DMD). 
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4.1 Configurations 


In this study, we consider a vertically discretized QG dynamical core proposed in 
[8] and extended in the stochastic setting [9]. This model consists in n isopycnal 
layers with constant thickness Hz and density px in each layer k. In this case, the 
prognostic variables such as w in (7) are assumed to be layer-averaged quantities. 
Homogeneous Dirichlet boundary conditions have been imposed for the term 
fod-/N? in (7b) at the ocean surface and bottom. Moreover, external forcing 
and numerical dissipation are included in the evolution of PV (7a): the Ekman 
pumping V+ - t due to the wind stress t over ocean surface boundary, a linear 
drag —( fonex/2)V7 Wn at ocean bottom with a very thin thickness nex, and a 
biharmonic dissipation — A4 V4(V? We) in each layer with uniform coefficient Aq. In 
particular, we consider here a finite box ocean driven by an idealized (stationary and 
symmetric) wind stress T = [— To cos(27y)/Ly, 0]’. A mixed horizontal boundary 
condition is used for the k-th layer streamfunction: Ygļa a = fk(t) and a? WilaA = 
—(&vc/Ax)ðn Ykla a (same for the 4-th order derivative). Here, A denotes the 
2D area, fg is a time-dependent function constrained by mass conservation [7], 
Ax stands for the horizontal resolution and ape is a nondimensional coefficient 
associated to the slip conditions [7]. A quiescent initial condition is used for 
the REF, whereas a spin-up condition downsampled from REF (after 90-years 
integration) is adopted for all the coarse-resolution models. The common parameters 
for all the simulations are listed in Table 1, whereas resolution dependant parameters 
are presented separately in Table 2. Both EOF and DMD modes are calibrated from 
the REF data during 40 years (after the spin-up) with a 5-days sampling step. As for 
the numerical discretization, a conservative flux form [9] together with a stochastic 
Leapfrog scheme [5] is adopted for the evolution of PV (7a). The inversion of 
the modified Helmholtz equation (7b) is carried out with a discrete sine transform 
method [7]. 


Table 1 Common parameters for all the models. The buoyancy frequency N? in (7b) is 
approximated by g%,0,5/(Hk + Hk+1)/2 on the interface between layers k and k + 1 


Parameters Value Description 

XxY (3840 x 4800) km Domain size 

A (350, 750, 2900) m Mean layer thickness 

840.5 (0.025, 0.0125) ms~? Reduced gravity 

Nek 2m Bottom Ekman layer thickness 

To 2 x 1075 m? s7? Wind stress magnitude 

Abe 0.2 Mixed boundary condition coefficient 
fo 9.375 x 1075 s7! Mean Coriolis parameter 

B 1.754 x 107!! (ms)! Coriolis parameter gradient 


Tn (39, 22) km Baroclinic Rossby radii 
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Table 2 Values of grid varying parameters. The energy proportion captured by the truncated EOF 
modes are given in the bracket. For DMD method, the first number stands for the size of M° (11a) 
whereas the latter is the one of M” (11b) 


Resolution (km) Timestep (s) Viscosity (mt s7!) EOF modes DMD modes 
5 600 2x 10° - m 

40 1200 5x 10!! 300 (83%) 14446 

80 1440 5 x 10! 300 (92%) 16 +74 

120 1800 1 x 10/3 300 (97%) 16+110 


REF (5 km) DET (80 km) STO-EOF (80 km) STO-DMD (80 km) 


y (km) 
y km) 
y (km) 


1000 


o o 
O 500 1000 1500 2000 2500 3000 3500 O 500 1000 1500 2000 2500 3000 3500 
x (km) x ikm) 


40 20 00 20 40 40-20 oo 20 AD 40 20 00 20 40 40 20 00 20 40 
aos) (20-8 s=) (0-5) (20° s=) 


Fig. 2 Snapshots of surface PV provided by different simulations after 60-years integration. The 
black arrows are the interpolated geostrophic velocities 


Snapshots of the surface PV provided by the different simulations are shown in 
Fig. 2. The dynamics of REF (5 km) model is mainly characterized by a meandering 
eastward jet with adjacent recirculations, which results from the most active 
mesoscale eddies effect through baroclinic instability. However, this effect cannot be 
properly resolved once the horizontal resolution exceeds the baroclinic deformation 
radius maximum (39 km here). For instance, the DET (80 km) simulation generates 
only a smooth symmetric field. On the other hand, both STO-EOF and STO- 
DMD models are able to reproduce the eastward jet on the coarse mesh (80 km) 
by including the non-linear effects carried both by the unresolved noise and the 
correction drift. In particular, the STO-DMD model produces a stronger meridional 
perturbation along the jet and is able to capture some of the large-wave structures 
predicted by the REF model. The improvements brought by these random models 
will be diagnosed and analyzed more precisely in the following. 


4.2 Diagnostics 


We first compare the long-term mean (over a 100-years interval) of the kinetic 
energy (KE) spectrum for both coarse models at different resolutions (40, 80, 
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103 | — REF (5 km) 103 | — REF (5 km) 

— DET (80 km) — DET (120 km) 
— STO-EOF (80 km) — STO-EOF (120 km) 
— STO-DMD (80 km) — STO-DMD (120 km) 


Power spectral density (J/m?) 
Power spectral density (J/m?) 
Power spectral density (/m?) 


1o-® 1055 10-6 ie 10 10-5 
isotropic wavenumbers (m~?) Isotropic wavenumbers (m~?) Isotropic wavenumbers (m~?) 


Fig. 3 Temporal mean of vertically integrated KE spectra for the different models 


120km). As shown in Fig. 3, introducing only a dissipation mechanism like the 
biharmonic viscosity in the DET coarse models leads to an excessive decrease of the 
resolved KE compared to the REF model. Both STO-EOF and STO-DMD models at 
different resolutions, recover a given amount of lost energy over all wavenumbers. 
In particular, the STO-DMD models provide higher KE backscattering at large 
scales and better spectrum slope in the inertial-range than the stationary unresolved 
models. This seems to highlight the importance of the non-stationary characteristic 
of the noise and Girsanov drift. 

We then quantify the temporal variability (over the same 100-years interval) 
predicted by the different coarse models. In this work, we adopt the following three 
global metrics. The first one is the root-mean-square error (RMSE) between the 
standard deviation of the streamfunction of a coarse model (denoted by o[w™]) and 
the subsampled high-resolution one (denoted by o[w*]), llo[W™] — oY] 
where D = A x [—H,0] and H stands for the total depth of the ocean basin. 
The second criterion is the Gaussian relative entropy (GRE) [6] which assesses in a 
single measure the mean and variance reconstruction: 


1 pi (FY oy" oy 
one = i [3 ( H 1 log ( J dx. (13) 


o?’ [y™] o’[y™ 


It is clear that a coarse model of high variability will have low RMSE and GRE, 
whereas a poor variability will lead to a large RMSE and GRE. The last metric 
measures the eddy kinetic energy (EKE), (09/2)||u’ l? 202 where u’ := (I — 
F;)[u] is the eddy velocity filtered out through a 2-years low-pass filter F, at every 
point in space. For comparison reason, we show here only the time average of this 
metric (EKE) for the different models. 

These three criteria are shown in Fig. 4 as bar plots. The DET models show very 
high RMSE and GRE with a very low order of EKE, meaning that they produce poor 
variability along time and failed to represent the eddies effect. Compared to the STO- 
EOF, the STO-DMD models enable to increase significantly the internal variability 
and the eddy energy. Moreover, these improvements are resolution-aware. As shown 
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Fig. 4 Comparison of variability measures for different coarse models. The y-axis of the last two 
figures are in log-scales 


in Table 2, under a similar level of captured energy, the STO-DMD models require 
much less modes than the STO-EOF, which reduces first the memory cost. Then, 
in terms of computational cost at each step, the former consists in generating less 
Gaussian variables than the latter, and reduces hence as well the dimension of the 
matrix-vector multiplication for the spectral decomposition (3). 


4.3 Discussion 


In order to distinguish the contribution of the correlated Girsanov drift and the 
uncorrelated noise, three additional benchmark runs (at resolution 80 km) have been 
further performed and compared to the proposed STO-DMD model, they are (i) 
STO-DMD without any correlation drift (i.e. or; = 0); Gi) STO-DMD only with 
oT, = uf; (iii) a simplified deterministic version of the proposed STO-DMD 
model, denoted as DET-DMD, which only encodes the (full) correlated drift o T; 
into the DET model. We remark that for the two first runs the DMD modes used 
for the correlated drift in the previous stochastic model are now included into the 
noise component. As shown in Fig. 5, run (i) fails to reproduce the eastwards jet 
on the coarse mesh, whereas the other runs succeed. However, run (ii) produces 
similar results as the STO-EOF model (see Fig. 2) with a lower improvement of 
variability, and run (iii) captures more waves than the others, yet leads to a reduction 
of the jet magnitude compared to the proposed STO-DMD model. In particular, by 
comparing the KE spectra of the different runs, Fig. 6 illustrates that the simplified 
DET-DMD model allows to produce backscattering of KE from small to large 
scales, and the proposed STO-DMD enhances this result with significantly higher 
KE at large-scales. We observe a consistent conclusion for the EKE budget (see 
Fig. 6). These comparisons demonstrate that the both correlated drift (ø ',) and the 
uncorrelated noise (odB,) contribute on the prediction of large-scale patterns and 
on the improvement of the variability of the large-scale models. 
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Fig. 5 Snapshots of surface PV provided by different simulations after 60-years integration. These 


four figures (from left to right) correspond to the benchmark runs (i), (ii), (iii) and the proposed 
STO-DMD model 
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Fig. 6 Comparison of KE spectra and layered EKE (only horizontally integrated) for different 
coarse models 
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5 Conclusions 


The proposed stochastic parameterization has been successfully implemented in a 
well established QG dynamical core. Different noises defined from high-resolution 
data have been considered. An additional correction drift ensuing from a change 
of probability measure has been introduced. This non-intuitive term seems quite 
important in the reproduction of the eastward jet within the wind-driven double- 
gyre circulation. Furthermore, the DMD procedure has been adopted to represent 
the quasi-periodic dynamic of the unresolved flow. The resulting random model 
enables us to improve the intrinsic variability of the large-scale resolved flow. 
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Deep Learning for the Benes Filter A 


Check for 
updates 


Alexander Lobbe 


Abstract The filtering problem is concerned with the optimal estimation of a 
hidden state given partial and noisy observations. Filtering is extensively studied 
in the theoretical and applied mathematical literature. One of the central challenges 
in filtering today is the numerical approximation of the optimal filter. Here, accurate 
and fast methods are actively sought after, especially for such high-dimensional 
settings as numerical weather prediction, for example. In this paper we present 
a brief study of a new numerical method based on the mesh-free neural network 
representation of the density of the solution of the filtering problem achieved 
by deep learning. Based on the classical SPDE splitting method, our algorithm 
includes a recursive normalisation procedure to recover the normalised conditional 
distribution of the signal process. The present work uses the Benes model as a 
benchmark. The Benes filter is a well-known continuous-time stochastic filtering 
model in one dimension that has the advantage of being explicitly solvable. 
Within the analytically tractable setting of the Benes filter, we discuss the role of 
nonlinearity in the filtering model equations for the choice of the domain of the 
neural network. Further, we present the first study of the neural network method 
with an adaptive domain for the Benes model. 


Keywords Nonlinear filtering - Deep learning - Stochastic PDE approximation 


1 Introduction 


Stochastic Filtering, i.e. the estimation of a signal process given only partial and 
noisy observations, is a well-studied problem, both in the theoretical and applied 
literature. It is relevant in many practical domains, for example in numerical weather 
prediction. Therefore, there is a high demand for efficient numerical methods to 
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approximate the optimal filter. Many such methods are known in the literature, 
among them the SPDE splitting method can be used to solve the filtering problem 
in low dimensions. The reason for the inefficiency of the splitting method in higher 
dimensions stems from the fact that the underlying state space must be explicitly 
discretised. This is problematic as the required number of discretisation points, 
known as the mesh, grows exponentially with the dimension of the state space. 
For this reason, the authors of [4] present a modified splitting method for the 
filtering problem which does not rely on the explicit space discretisation. The 
method developed in [4] is therefore called mesh-free and relies on a neural network 
representation of the solution. This means that, instead of approximating the values 
of the solution on a discrete mesh, we can optimize the parameters of a neural 
network defined on the state-space itself. 

In this paper we present a further study of the deep learning method developed 
in [4] on the example of the Benes filter. The algorithm is derived from the classical 
splitting method for SPDEs which consists of a deterministic PDE approximation 
step and a normalisation step to incorporate the randomness of the SPDE. Our 
algorithm replaces the PDE approximation step of the splitting method by a neural 
network representation and learning algorithm. Combined with the Monte-Carlo 
method for the normalisation step, this method becomes completely mesh-free. 
Furthermore, an important property of the methodology in the filtering context 
is the ability to iterate it over several time steps. This allows the algorithm to 
be run online and to successively process observations arriving sequentially. In 
order to be computationally feasible, the domain of the neural network needs to be 
restricted. This restricted domain needs to cover the support of the density as well 
as possible in order to yield a sensible solution. In [4] the neural network domain 
is fixed a priori and does not move with the solution. This presents two problems. 
First, it is unnecessarily large to cover the support over all timesteps. Second, the 
solution may eventually move outside the computational domain, rendering the 
approximation inadequate. It was therefore noted in [4] that a possible extension 
of the approximation method would be given by an adaptive domain as the support 
of the neural network. We present in this work the first results obtained using an 
adaptive domain in the nonlinear and analytically tractable case of the Benes filter. 

The paper is structured as follows. In Sect. 1.1 we briefly introduce the nonlinear, 
continuous-time stochastic filtering framework. The setting is identical to the one 
assumed in [4] and the reader may consult [1] for an in-depth treatment of stochastic 
filtering. Thereafter, in Sect.2.2, we formulate the Benes filtering model used as a 
benchmark. Then, in Sect. 1.2 we introduce the filtering equation and the classical 
SPDE splitting method. This is the method upon which the new algorithm in [4] was 
built. 

Next, in Sect. 2 we present an outline of the derivation of the new methodology. 
For details, the reader is referred to the original article [4]. The first idea of 
the algorithm, presented in Sect.2.1 is to reformulate the solution of the PDE 
for the density of the unnormalised filter as an expected value. This is done 
using the Feynman—Kac formula, based on an auxiliary diffusion process derived 
from the model equations. Moreover, in Sect.2.3 we briefly specify the neural 
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network parameters used in the method, as well as the employed loss-function. The 
theoretical part of the paper is concluded with Sect.2.4 where we show how to 
normalise the obtained neural network from the prediction step using Monte-Carlo 
approximation for linear sensor functions. 

Section 3 contains the detailed parameter values and results of the numerical 
studies that we performed. Specifically, we perform two experiments, the first one, 
Sect. 3.1, is carried without any domain adaptation and highlights the limitations of 
ad-hoc parameterization of the domain. It is a simulation of the Benes filter using 
the deep learning method over a larger domain, as well as longer time interval than 
in the paper [4]. In particular, the size of the domain was estimated using the exact 
solution of the Benes model. This is necessary, as the nonlinearity of the Benes 
model makes it difficult to know the evolution of the posterior a priori. Thus we 
would be requiring a much larger domain, if chosen in an ad-hoc way. The second 
experiment, in Sect.3.2, reports the performance of the proposed framework with 
domain adaptation. The adaptation was performed using precomputed estimates of 
the support of the filter by employing the solution formula for the Benes filter. 

Finally, we formulate the conclusions from our experiments in Sect. 4. In short, 
the domain adapted method was more effective in resolving the bimodality in our 
study than the non-domain adapted one. However, this came at the cost of a linear 
trend in the error. 


1.1 Nonlinear Stochastic Filtering Problem 


The stochastic filtering framework consists of a pair of stochastic processes (X, Y) 
on a probability space (2, F, P) with a normal filtration (F;);>9 modelled, P-a.s., 
as 


t t 
x, = xo+ f fx)ds-+ | o (Xs) dVy , (1) 
0 0 
and 
$ 
Y, =f h(X,) ds + W, . (2) 
0 


Here, the time parameter is t € [0, œ), d,p € Nand ff : RI — Rf and 
o : R? + RPP are the drift and diffusion coefficient functions of the signal. 
The processes V and W are p- and m-dimensional independent, (F;);>0-adapted 
Brownian motions. We call X the signal process and Y the observation process. The 
function h : R? —> R” is often called the sensor function, or link function, because 
it models the possibly nonlinear connection of the signal and observation processes. 
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Further, consider the observation filtration (Y;)1>0 given as 


WY =o sE MyM and Y=ol (J x»). 


teE[0,00) 


where N are the P-nullsets of F. The aim of nonlinear filtering is to compute the 
probability measure valued (;);>o-adapted stochastic process z that is defined by 
the requirement that for all bounded measurable test functions g : R? —> R and 
t € [0, oo) we have P-a.s. that 


me = E[y(X;) |]. 


We call x the filter. 
Furthermore, let the process Z be defined such that for all t € [0, co), 


t t 
Z; = exp(- f h(Xs) dW, — J h(X;)} ds}. 
0 2 Jo 


Then, assumimg that 


t ta 
f nxs? as| <œ and | | Zh X,) as| < 00, 
0 0 


we have that Z is an (F;);>0-martingale and by the change of measure (for details, 


see [1]) given by ar a Zr, t > 0, the processes X and Y are independent 
t 


under P and Y is a P-Brownian motion. Here, P is the consistent measure defined on 
U tel0,o0) Fr- Finally, under P, we can define the measure valued stochastic process 
p by the requirement that for all bounded measurable functions gy : R? —> R and 
t € [0, co) we have P-a.s. that 


t 1 t 
Pp = [enef h(Xs) dY; — =f h(Xs)° ds} 


z ; (3) 


The Kallianpur-Striebel formula (see [1]) justifies the terminology to call p the 
unnormalised filter. 


1.2 Filtering Equation and General Splitting Method 


Note that under the conditions given in [4], X admits the infinitesimal generator 
A: D(A) > BRI) given, for all g € D(A), by 
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Ag = (f, Vg) + Tr(a Hess p), (4) 


where D(A) denotes the domain of the differential operator A and a = 500" . The 
symbol B(R@) denotes the set of real-valued, bounded, Borel-measurable functions 
defined on R¢. 

It is well-known (see, e.g., [1]), that the unnormalised filter p satisfies the filtering 
equation, i.e. for all t > 0, we have P-a.s. that 


t 


t 
OCT E f bpi Í ps(oh’) dY; (5) 


The classical splitting method for the filtering equation is given in [3] and seeks 
to approximate the following SPDE for the density p; of the unnormalised filter 
given, for all t > 0, x € IR¢, and P-a.s. as 


t t 
je Spee j! A ds + j! h'(x)ps(x) a¥, 


and relies on the splitting-up algorithm described in [9] and [10]. Here A* is the 
formal adjoint of the infinitesimal generator A of the signal process X. 
We summarise the splitting-up method below in Note 1. 


Note I The splitting method for the filtering problem is defined by iterating the 
steps below with initial density p°(-) = po(-): 


1. (Prediction) Compute an approximation p” of the solution to 


dq” * ON d 
ar (t, z) =A q4 (t, z), (t, z) E€ (tn—1; tn] xR , 


q” (0, z) = p"! z),  zeR’, 


(6) 


at time t, and 
2. (Normalisation) Compute the normalisation constant with zn = (Y, — 
Y;,_1)/(tn — tn—1) and the function 


th — th— 
R? >z En (z) = exp (“S41 - hair) i 


so that we can set, 
n 1 ~n d 
p” (2) = =-En(z) p” (2); z E RI, 
Ch 


where Cy = fga En (Z) P” (2) dz. 
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The deep learning method studied below replaces the predictor step of the splitting 
method above by a deep neural network approximation algorithm to avoid an 
explicit space discretisation. This is achieved by representing each p”(z) by a 
feed-forward neural network and approximating the initial value problem (6) based 
on its stochastic representation using a sampling procedure. The normalisation 
step may then be computed either using quadrature, or, to preserve the mesh-free 
characteristic, by Monte-Carlo approximation. 


2 Derivation and Outline of the Deep Learning Algorithm 


Here, we present a concise version of the derivation laid out in detail in [4]. 


2.1 Feynman-Kac Representation 


Assuming sufficient differentiability of the coefficient functions, the operator A* 
may be expanded such that for all compactly supported smooth test functions g € 
CS (R7, R) we have 


A*g = Tr(a Hess p) + (2div(a) — f, grad g) + div(div(a) — f)ọ. (7) 


Subtracting the zero-order term from (7), we obtain an operator that generates the 
auxiliary diffusion process, denoted X, which is instrumental in the deep learning 
method. 


Definition 1 Define the partial differential operator A: ce (R?,R) > C (R$, R), 
with image in the set of bounded continuous function on RZ, such that for all Qe 
CX (RI, R), 


Ag = Tr(a Hess p) + (2div(a) — f, grad 9) 
and the function r : R? — R such that for all x € R, 


r(x) = div(div(a) — f)(x). 


Lemma 1 For all x € Rf the operator A defined in Definition 1 is the infinitesimal 
generator of the Itô diffusion X : (0,00) x 2 — R? given, for allt > 0 and 
P-a.s. by 


t 


t 
ae f b(X,)ds + f MALVA 
0 0 
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where W : [0, 00) x 2 —> R? is a d-dimensional Brownian motion and b : R? > 
Rf is the function 


— 
b = 2div(a) — f. 


From the well-known Feynman—Kac formula (see Karatzas and Shreve [6, 
Chapter 5, Theorem 7.6]) we can deduce the Corollary | below for the initial value 
problem. 


Corollary 1 Let d € N, T > 0, let k: R? — [0, 00) be a continuous function, 
let A be the operator defined in Definition 1, and let y : R4 —> R be an at 
most polynomially growing function. Suppose that u € Ch (0, T] x Rf, R) is 
continuously differentiable with bounded derivative in time and twice continuously 
differentiable with bounded derivatives in space, and satisfies the Cauchy problem 


L x) +k(x)u(t, x) = Au(t,x), (t,x) € (0, T] x RÊ, 
at (8) 
u(0, x) = w(x), x eR’, 


Then, for all (t, x) € (0, T] x RY, we have that 


t 
u(t, x) = [uao (-f kvar) Ro =], 
0 


where X is the diffusion generated by A. 


Recall that our aim is to approximate the Fokker—Planck equation (6). Assume 
from now on the discrete times {fo = 0, t1, t2 . . . }, indexed by n. Written in the form 


as in Corollary 1, for any timestep n = 1, 2,..., (6) reads as 
dq” An n d 
at (t, z) = Aq (t, z) + r(z)q (t, Z); (t, z) Ẹ (tn-1; tn] x R ’ 
q" (0,2) = p"'(2), zeR?, 
Thus, with k = —r, and assuming that —r is non-negative in (8), we obtain by 
Corollary | the representation, for all n € {1,..., N}, t € (tn—1, tal, z € RI, 
A t A A 
q”(t,z)=E | p"!(X,) exp ( f roar) | Xn = J (9) 
tn—1 


Note that [4, Proposition 2.4] shows that we have a feasible minimisation 
problem to approximate by the learning algorithm (see also [2, Proposition 2.7]). 
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2.2 The Benes Filtering Model 


The Benes filter is a one-dimensional nonlinear model and is used as a benchmark 
in the numerical studies below. As we show below, it is one of the rare cases 
of explicitly solvable continuous-time stochastic filtering models. Here, we are 
considering a special case of the more general class of Benes filters, presented, for 
example, in [1, Chapter 6.1]. 

The signal is given by the coefficient functions 


f(x) = ao tanh(B + ax/o) and o(x) =o ER, 
where a, $ € R and the observation is given by the affine-linear sensor function 
h(x) = hix +ho, 


with h1, h2 € R. The density pz of the filter solving the Benes model is then given 
by two weighted Gaussians (see [1, Chapter 6.1]) as 


paz) = WP (u, v) (z) + w7 P (u; , v) (z), (10) 


where u; = Mọ /(2v:), ve = 1/(2u;), and 


P exp((M;")*/(4v;)) 


exp((M,")?/(4ur)) exp((M7 )?/(4v:)) 


with 


t sinh ha +h h 
M# =+% +h f SEUTU) gyp ST A abato, 
o sinh(t¢o) o sinh(t¢o) o 


vi = hı coth(téa)/20, and ¢ = ,/a?/o? + hi. 
Further, for the Benes model, the auxiliary diffusion is given as 


t 


t 
X: = Xo - / ao tanh(6 + ax/o)ds + Í o dW;, 
0 0 
and the coefficient 
r(x) = —div f(x) = —a’sech?(B + ax/o). 


Therefore the representation of the solution to the Fokker—Planck equation (6) in the 
Benes case reads 


t 
q’t,zj=E [r '(X,) exp (-f a? sech? (8 + wX,/c) ar) | Xt, = | : 
ty 


n— 
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2.3 Neural Network Model for the Prediction Step 


To solve the Fokker—Planck equation over a rectangular domain 24 = [a1, 61] x 

x [aa, Ba], we employ the sampling based deep learning method from [2]. Using 
the representation (9), the solution of the Fokker—Planck equation is reformulated 
into an optimisation problem over function space given in [4, Proposition 2.4]. 
This in turn yields the loss functions for the learning algorithm. Writing ÑE for the 
auxiliary diffusion with Unif(2z)-random initial value €, the optimisation problem 
is approximated by the optimisation 

| 


T 
inf 7, vape(- f kÐ ar) - 
geRria2 ii Hi 0 


where the solution of the PDE is represented by a neural network MV and the 
infinite-dimensional function space has been parametrised by 0. Here, L denotes 
the depth of the neural net, and the parameters l; are the respective layer widths. 
Further details can be found in [4]. A comprehensive textbook on deep learning 
is [5]. We apply a modified gradient descent method, called ADAM [7], to determine 
the parameters in the model by minimising the loss function 


LOSE VG als 


Np J-1 E 


T VO Dep > KRED Gj — 1) -NMED , 
j=0 

where N; is the batch size and {é', ecg in , İs a training batch of independent 
identically distributed realisations £f of E ~ U(Qy) and eg the approximate 
iid. realisations of sample paths of the auxiliary diffusion started at &! over the 
time-grid t = 0 < tT < --- < TJ—-1 < ty = T. For the approximation of the 
sample paths of the diffusion we use the Euler-Maruyama method [8]. Additionally, 
we augment the loss £ by an additional term to encourage the positivity of the neural 
network. Thus, in practice, we use the loss 


£6: {8 (Rg) = LO: (6! (XE yaoi) +a max, -N NED) 


i=1 


with the hyperparameter À to be chosen. 
Thus, in the notation of Sect. 1.2 we replace the Fokker—Planck solution by a 
neural network model, i.e. we postulate a neural network model 


Pn(Z) = NN), 
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with support on (2g. Therefore we require the a priori chosen domain to capture 
most of the mass of the probability distribution it is approximating. 


2.4 Monte-Carlo Normalisation Step 


We then realise the normalisation step via Monte-Carlo sampling over the bounded 
rectangular domain (24 to approximate the integral 


f En (ONN (2) dz = exp (“Se - hai?) NN(z) dz, (11) 
R4 2a 2 


where, as defined earlier, z, = oe — Yı). Note that, since 92g is the 
support of the neural network VN, the right-hand side above is indeed identical to 
the integral over the whole space. 

The sensor function in the Benes model is given by h(x) = hix + h2. Then, the 


likelihood function becomes 


2 n—h 1 
E,(z) = i (3 ; = Joo 


2 
y (tn — tn-1)hf In — tn—1)h1 


where Mpat (H, a?) denotes the probability density function of a normal distribution 
with mean jz and variance o°. Therefore, we can write the integral (11) as 


2" _ nays zy (aot z): 
Ji = tn—1)h? hı (tn — tn—1)hi 


This is an implementable method to compute the normalisation constant C,,. Thus, 
we can express the approximate posterior density as 


p” z) = x (z) p" (2). 


Therefore, the methodology is fully recursive and can be applied sequentially. 


Remark I In low-dimensions, the usage of the Monte-Carlo method to perform the 
normalisation is optional, since efficient quadrature methods are an alternative. We 
chose the sampling based method to preserve the grid-free nature of the algorithm. 
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3 Numerical Results for the Benes Filter 


The neural network architecture for all our experiments below is a feed-forward 
fully connected neural network with a one-dimensional input layer, two hidden 
layers with a layer width of 51 neurons each and batch-normalisation, and an 
output layer of dimension one (a detailed illustration can be found in [4]). For 
the optimisation algorithm we chose the ADAM optimiser and performed the 
training over 6002 epochs with a batch size of 600 samples. The initial signal 
and observation values are x9 = yo = 0 and the coefficients of the Benes model 
were chosen asa = 3, B = 0,0 = 0.5, hı = 3, ho = O, and timestep 
At = 0.1 over N = 40 steps. The initial condition is a Gaussian density with 
mean 0 and standard deviation 0.001. The posterior was calculated over the domain 
[—9, 2.5]. The domain boundaries were pre-estimated using a simulation of the 
exact Benes filter with fixed random seed. In the case of the domain adaptation 
we used the precomputed evolutions from the true solution to estimate the support 
of the posterior and set a fixed domain adaptation schedule. The spatial resolution is 
1000 uniformly spaced values in the domain of definition of the neural network. At 
each time step, the training of the network consumes 6002-600 = 3,601,200 Monte- 
Carlo samples. Additionally we employ a piecewise constant learning rate schedule 
Ir(epoch) = 10~@tepech mod 2001) and the normalisation constant is computed 
using 10’ samples each timestep. The regularising parameter 4 = 1. 


3.1 No Domain Adaptation 


Figure | shows the plots for the Benes filter without domain adaptation. In Fig. la 
we observe the drift of the posterior toward the left edge of the domain. The initial 
bimodality, reflecting the uncertainty due to few observed values, quickly resolves 
and the approximate posterior tracks the signal within the domain. In Fig. 1b the 
bimodality is mostly visible in the Monte-Carlo prior and smoothed out by the 
neural network. Figure 1c and d show snapshots of the progression of the filter. 
The absolute error in means with respect to the Benes reference solution is plotted 
in Fig. 2a and shows that as the posterior reaches the left domain boundary, the 
error increases. This is reflected as well in the drop of probability mass, Fig. 2c, 
and Monte-Carlo acceptance rate, Fig. 2d at later times. It is not clear from Fig. 2a 
if there is a trend in the error. Further experiments need to be performed to check 
this hypothesis. Figure 2b shows that the neural net training consistently succeeds 
as measured by the L2 distance between the Monte-Carlo reference prior and the 
neural net prior. 


N 
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Fig. 1 Results of the combined splitting-up/machine-learning approximation applied iteratively 
to the Benes filtering problem (no domain adaptation). (a) The full evolution of the estimated 
posterior distribution produced by our method, plotted at all intermediate timesteps. (b—d) 
Snapshots of the approximation at times, £ = 0.6, t = 1.8, and tf = 3.9. The black dotted line 
in each graph shows the estimated posterior, the yellow line the prior estimate represented by the 
neural network, and the light-blue shaded line shows the Monte-Carlo reference solution for the 
prior 
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Absolute error in means L2 error during training 
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Fig. 2 Error and diagnostics for the Benes filter (no domain adaptation). (a) Absolute error in 
means between the approximated distribution and the exact solution. (b) L2 error of the neural 
network during training with respect to the Monte-Carlo reference solution. (c) Probability mass 
of the neural network prior. (d) Monte-Carlo acceptance rate 


3.2 With Domain Adaptation 


Figure 3 shows the plots for the Benes filter with domain adaptation. In Fig. 3a 
we observe again the drift of the posterior toward the left edge of the domain. and 
the initial bimodality resolves. The approximate posterior tracks the signal within 
the domain. In Fig. 3b the bimodality is visible both in the prior an the posterior 
network. This shows that the domain adaptation helps resolve the bimodality in the 
nonlinear case by increasing the spatial resolution while keeping the computational 
cost equal. Figure 3c and d again show snapshots of the progression of the filter. 
The absolute error in means with respect to the Benes reference solution is plotted 
in Fig. 4a and shows a clear linear trend. This is an interesting phenomenon, likely 
due to the reduced domain size and subsequent error accumulation. The probability 
mass, Fig.4c, and Monte-Carlo acceptance rate, Fig.4d are stably fluctuating. 
Figure 4b shows here again that the neural net training consistently succeeds. 
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Fig. 3 Results of the combined splitting-up/machine-learning approximation applied iteratively 
to the Benes filtering problem (with domain adaptation). (a) The full evolution of the estimated 
posterior distribution produced by our method, plotted at all intermediate timesteps. (b—d) 
Snapshots of the approximation at times, f = 0.6, f = 1.8, and t = 3.9. The black dotted line 
in each graph shows the estimated posterior, the yellow line the prior estimate represented by the 
neural network, and the light-blue shaded line shows the Monte-Carlo reference solution for the 


prior 
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Fig. 4 Error and diagnostics for the Benes filter (with domain adaptation). (a) Absolute error in 
means between the approximated distribution and the exact solution. (b) L2 error of the neural 
network during training with respect to the Monte-Carlo reference solution. (c) Probability mass 
of the neural network prior. (d) Monte-Carlo acceptance rate 


4 Conclusion and Outlook 


We have studied the domain adaptation in our method from [4] on the example of 
the Benes filter. We observed that the domain adapted method was more effective in 
resolving the bimodality than the non-domain adapted one. However, this came at 
the cost of a linear trend in the error. A possible direction for future work would thus 
be to investigate the optimal domain size more closely, in order to mitigate the error 
trend, and make full use of the increased resolution from the domain adaptation. 
This is subject of future research in connection with more general domain adaptation 
methods than the one employed here, which is specific to the Benes filter. 

As already noted in the previous work [4], the possibility for transfer learning in 
our method should be explored. 
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A long-term goal in the development of neural network based numerical methods 


mu 


st of course be the rigorous error analysis, which remains a challenging task. 
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End-to-End Kalman Filter in a High A 
Dimensional Linear Embedding of the EN 
Observations 


Said Ouala, Pierre Tandeo, Bertrand Chapron, Fabrice Collard, 
and Ronan Fablet 


Abstract Data assimilation techniques are the state-of-the-art approaches in the 
reconstruction of a spatio-temporal geophysical state such as the atmosphere or the 
ocean. These methods rely on a numerical model that fills the spatial and temporal 
gaps in the observational network. Unfortunately, limitations regarding the uncer- 
tainty of the state estimate may arise when considering the restriction of the data 
assimilation problems to a small subset of observations, as encountered for instance 
in ocean surface reconstruction. These limitations motivated the exploration of 
reconstruction techniques that do not rely on numerical models. In this context, 
the increasing availability of geophysical observations and model simulations 
motivates the exploitation of machine learning tools to tackle the reconstruction 
of ocean surface variables. In this work, we formulate sea surface spatio-temporal 
reconstruction problems as state space Bayesian smoothing problems with unknown 
augmented linear dynamics. The solution of the smoothing problem, given by the 
Kalman smoother, is written in a differentiable framework which allows, given some 
training data, to optimize the parameters of the state space model. 


Keywords Kalman filter - Machine learning - Spatio-temporal interpolation 


1 Introduction 


Data assimilation in a broad sense can be considered as the inference of a hidden 
state, based on several sources of information. When considering data assimilation 
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in the context of oceanography, these schemes exploit, in addition to some given 
observations, a dynamical model to perform simulations from given ocean states [1]. 
Unfortunately, realistic analytic parameterizations of the dynamical model, in the 
context of sea surface variables reconstruction, lead to computationally demanding 
representations [2]. Furthermore, when associated to a small subset of observations 
(as encountered for instance when assimilating sea surface variables with a global 
ocean model), these realistic models may result in modeling and inversion uncer- 
tainties. On the other hand, the analytic derivation of computationally-efficient, 
low-order models involves theoretical assumptions, which may not be fulfilled 
by real observations. These limitations motivated the exploration of interpolation 
techniques that do not require an explicit dynamical representation. Among other 
methods, Optimal Interpolation (OI) became the state-of-the-art framework [3, 4]. 
This technique does not need an explicit formulation of the dynamical model and 
rather relies on the modelization of the covariance of the spatio-temporal fields. 
Despite the success of OI, this technique tends to smooth the fine scale structures 
which motivates the development of new spatio-temporal interpolation schemes, 
mainly based on machine learning representations [5—10]. 

From the perspective of the machine learning community, state-of-the-art recon- 
struction techniques are usually formulated as inverse problems, where one searches 
to maximize the reconstruction performance of an inversion model, given the 
observed field as an input. Several methods were developed for this purpose 
in the fields of signal denoising [11, 12] and image inpainting [13] where the 
inversion model typically relies on a deep learning architecture. This end-to-end 
learning strategy, differs from classical inversion techniques used in geosciences, 
where the state-space representations (specifically the dynamical models) and the 
inversion schemes are a priori unrelated. The recent exploration of machine learning 
representations in the context of sea surface fields reconstruction was inspired by the 
latter methodological viewpoint, where a data-driven dynamical model is optimized 
based on the minimization of a forecasting cost. This data-driven prior is then 
plugged into a data assimilation framework to perform reconstruction based on 
classical (Kalman based, variational formulations) inversion schemes [7, 14, 8]. 

Recently, several works investigated end-to-end deep learning architectures in the 
resolution of reconstruction issues in geosciences [15-17, 10]. However, this tools, 
although relevant, were naturally explored in the context of image denoising and 
inpainting applications due to the lack of methodological formulation. When con- 
sidering geosciences applications, a huge effort was carried within the geosciences 
community to derive reconstruction algorithms that, beyond being efficient with 
respect to a given metric, are robust and rely on a solid methodological formulation. 
From this point of view, we believe that end-to-end deep learning techniques should 
build on such methodological knowledge to propose new reconstruction solutions 
that can achieve both a decent performance score, and remain theoretically relevant 
which helps the understanding and generalization of these algorithms. From this 
point of view, we exploit ideas from machine learning and Bayesian filtering to 
propose a framework that is able to provide a relevant reconstruction of a spatio- 
temporal state. Specifically, we formulate a new state space model for ocean surface 
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observations based on an augmented linear dynamical system. Assuming that the 
model and observation errors are Gaussian, the solution of the filtering/smoothing 
problem on this new state space model is given by the Kalman filter/smoother. 
Inspired by deep learning architectures, the Kalman recursion is written in a 
differentiable framework, which allows for the derivation of the parameters of the 
new state-space model based on a reconstruction cost of the observations. 


2 Method 


Motivation Let us assume the following state-space model 


x, = fX) +m (1) 
yr = HiX) + €r (2) 


where t € [0, +00] is time. The variables x; € R* and y, € R” represent 
the state variables and the observations respectively. f and H; are the dynamical 
and observation operators. ņ, and €; are random processes accounting for the 
uncertainties. They are defined as centered Gaussian processes with covariances 
Q; and R; respectively. 

In the context of geosciences, and when considering the resolution of filtering and 
smoothing problems using data assimilation, the dynamical and observation models 
f and H, the model and observation error covariances Q; and R; as well as the true 
state x; of Eqs. (1) and (2) are either unavailable or too complicated to handle. In 
this context, we show in this work how to exploit observations y; sampled from time 
tı to time t¢ to learn a Bayesian scheme that allows for reconstruction applications 
given new observations (i.e., at time t > ty). 


Definition of a New State Space Model In this work, we consider an embedding 
of the observations as proposed in [18]. Specifically, we project our observations 
(or a reduced order version of our observations) into a higher dimensional space 
where the dynamics of the observations are assumed to be linear. Formally, in order 
to derive our new state-space model, we first start by writing an augmented state 
u; such as u;? = [(My,)’, aa ] with z; € R’ is the unobserved component of the 
augmented state u; and M € R’*” with r < n a linear projection operator (that 
can be used for instance in the context of reduced order modeling). The matrix M is 
assumed to have r orthogonal lines so that the matrix M~! = M” verifies MM~! = 
I. We used in this work an Empirical Orthogonal Functions (EOF) projection. This 
constraints M to be a matrix of orthogonal eigenvectors of the covariance matrix of 
the centered data. The augmented state u; € RE, with d g = l + r, evolves in time 
according to the following state-space model: 


u, = Agu, +1; (3) 


214 S. Ouala et al. 


Yı =M'Gu, + (4) 


where the dynamical operator A, is a dg x dg matrix with coefficients o. G 
is a projection matrix that satisfies My; = Gu;. The eigenvalues of the matrix 
Aj encode the decaying and oscillating modes of the dynamics that are learned 
from data. Furthermore, the matrix A, can be constrained to be skew-symmetric 
(simply by imposing Ag = 0.5(B, — BZ) with B, a trainable matrix) so the 
solution of (3) will be written as a weighted sum of dg /2 trainable oscillations, 
where the corresponding frequencies been encoded in the imaginary parts of the 
eigenvalues of A,. This formulation is highly suitable for Hamiltonian (conserva- 
tive) dynamical systems since the energy of the system is conserved and allows 
guaranteeing long term boundedness of the model. Furthermore, this formulation 
differs fundamentally from classical Auto Regressive (AR) models written in the 
space of the observations. Indeed, simple AR models only have a number ofr < dg 
eigenvalues, which limits their expressivity. 

It is worth noting that this formulation closely relates to the Koopman operator 
[19] where the augmented state u; can be seen as a finite dimensional approximation 
of the infinite dimensional Hilbert space of measurements of the hidden state xz. 
This model takes advantage of a linear formulation of the dynamics in a space 
of observables, where the resulting model is perfectly linear for a category of 
dynamical regimes (typically periodic and quasi-periodic ones), and can provide 
a decent short-term approximation of chaotic regimes. It can also be seen as a 
generalization of the Dynamic Mode Decomposition (DMD) method, in which 
u = My;. 


Model and Observations Error Covariances The model and observation errors 
n, and €; are assumed to follow Gaussian distributions with zero mean and 
covariance matrices Q,,; and Rg,;, respectively. These covariance models can be 
parameterized as neural networks with parameter vectors À and @. 


Smoothing Scheme A Kalman smoother, based on the above state-space model, 
is written in a differentiable framework. The idea is to derive an analytical solution 
of the posterior distribution p(U;|Yz,-1 es based on the Kalman recursion. Formally, 
given a regular time discretization t € [f},..., ty] where N is a positive integer and 
given the initial moments uy, and P% , the mean u° and covariance P* of the posterior 
distribution p(u, lYnury) can be computed as follows: 


ul | = Fu’ (5) 
Pp!) = FPEF" + Qh, (6) 
Kı = P/H" [HPI 0D" + Ro! (7) 
ut, = uf + Kalyi — Hu’, |] (8) 
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Pi = Pi B K,+ıHP/ (9) 
Ki) = Pr F" P), 2)! (10) 
uy = Uy + K; URERA (11) 
Pii = Pa K, PÍ- PDE) (12) 


where F = e%^o with dt the prediction time step and H = M~!G. The smoothing 
(Eqs. (10), (11) and (12)) is carried backward in time with Pi, = Pi, and uj, = uy, 


Learning Scheme The tuning of the trainable parameters vector 6 = [o, A, 6)’ 


A 


is carried using the following loss function: 6 = arg min{y £1 + y~2L£2} where 
Li = Eip lly: — Hus ||? and £3 = Hog(|HP/, A" + Ry, |) 


1 t=tn H f2 . . 
5 — Hu and and are weighting parameters. 
+32 lly lap? HT+Ry, yı v2 ghting p 


The first term £; is simply the quadratic reconstruction error of the observation. 
The minimization of this error helps to recover an initial guess of the trainable 
parameters. The second term, £2 is the negative log likelihood of the observations. 
This likelihood is derived from the likelihood of the innovation, i.e. p(yi:7) = 


Tz) pcyelyt—1) [20]. 


3 Numerical Experiments 


3.1 Preliminary Analysis on SST Anomaly Data 


As an illustration of the proposed framework, we consider scalar measurements 
of the anomaly of the Sea Surface Temperature (SST) in the Mediterranean Sea 
(8.6°N and 43.8°E). The data are computed based on of the annual 99th percentile 
of Sea Surface Temperature (SST) from model data [21]. The time series consists 
of daily measurements of the SST anomaly from 1987 to 2019. The training data is 
composed of a sparse sampling of the original time series, as highlighted in Fig. la. 
The proposed framework is tested with the following configuration: The augmented 
state space model is built with M = J, andz € R5. The model error covariance is 
a constant matrix of size, dg x dg and the observation error covariance is a scalar 
parameter that corresponds to the variance of the SST anomaly measurement error. 
Finally, the training is carried with y; = 0 and y2 = 1. 

Figure 1b highlights the reconstruction performance of the smoothing Probability 
Density Function (PDF) with respect to the true (unobserved) state. Interestingly, 
and despite the fact that the observations used to train the parameters of the Kalman 
filtering scheme were extremely sparse, the proposed framework is able to catch 
the correct underlying frequencies. Furthermore, the coverage probability of the 
PDF highlights the effectiveness of the estimated model and observations error 
covariances. 
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3.2 Shallow Water Equation (SWE) Case-Study 


Dataset Description We consider the SWE without wind stress and bottom 
friction. The momentum equations are taken to be linear, and the continuity equation 
is solved in its nonlinear form. The direct numerical simulation is carried using a 
finite difference method. The size of the domain is set to 1000km x 1000km with 
a corresponding regular discretization of 80 x 80. The temporal step size was set 
to satisfy the Courant—Friedrichs—Lewy condition (h = 40.41 s). The data were 
subsampled to h = 40.41 x 10 and 500 time-steps were used as training data. The 
models were validated on a series of length 100. As observations, we randomly 
sample 1% of the pixels with a temporal coverage given in Fig. 2. 


Parametrization of the Data-Driven Models The application of the above frame- 
work in the spatio-temporal reconstruction of sea surface fields should be considered 
with care to account for the underlying dimensionality. In this context, and following 
several related works [14, 9], a patch based representations is considered in order 
to reduce the computational complexity of the model. Specifically, this patch 
based representations allows a block diagonal modelization of the covariance 
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Fig. 2 Daily performance time series: we report the reconstruction performance of the sea surface 
elevation and its gradient in (a) and (b) respectively 


matrices, which significantly reduces the computational and memory complexity 
of the model. This patch-based representation is fully embedded in the considered 
architecture to make explicit both the extraction of the patches from a 2D field and 
the reconstruction of a 2D field from the collection of patches. The latter involves a 
reconstruction operator F, which is learned from data. 

This patch-level representation is carried with a fixed shape of 35 x 35 pixels 
and a 10 pixels overlap between neighboring patches, resulting in a total of 16 
overlapping patches. For each patch P;,i = 1,...,16 we learn an EOF basis 
Mp, from the training data. We keep the first 20 EOF components, which amount 
on average to 95% of the total variance. This patch-based decomposition is shared 
among all the tested models. The end-to-end Kalman filter architecture (E2EKF) is 
applied on a patch level with an augmented linear model operating on an embedding 
of dimension dg = 60. The reconstructed patches are combined through the 
reconstruction model F;. This model is implemented as a residual, two blocks, 
convolutional neural network. The first block of the network contains four layers 
with 6 filters of size k x k (with k ranging from 3 to 17). The second block involves 
5 layers, the first four containing 24 filters and a similar kernel size distribution as 
the ones in the first block, the last layer is a linear convolution with a single filter. 

The proposed technique is compared in this work to the following schemes: 


— Data-driven plug-and-play Kalman filter (KF): In order to show the relevance 
of the proposed end-to-end architecture, its plug-and-play counterpart is also 
tested. This model exploits the same patch based augmented linear formulation as 
the end-to-end one, however, the parameters of the dynamical model are trained 
based on a forecasting criterion and plugged into a Kalman filtering scheme. 
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Table 1 Surface elevation (7) interpolation experiment: reconstruction correlation coefficient and 
root mean squared error (RMSE) over the elevation time series and their gradient. Bold values 
denote smallest RMSE and highest percentage correlation 


Entire map Missing data areas 

RMSE Correlation RMSE Correlation 
Model nm) | Vn(m/°) |n Vn n(m) | Vn(m/°) |n Vn 
Proposed, E2EKF | 0.046 | 0.009 73.10% | 41.89% | 0.047 | 0.010 73.80% | 41.90% 
AnDA 0.058 | 0.011 52.74% | 35.91% | 0.060 | 0.011 52.82% | 21.25% 
KF 0.060 | 0.010 64.57% | 21.21% | 0.059 | 0.010 64.68% | 36.06% 


— Analog data assimilation (AnDA): We apply the analog data assimilation frame- 
work [14, 7] with a locally linear dynamical kernel and an ensemble Kalman 
filter scheme. Please refer to [14, 7] for a detailed description of this data-driven 
approach, which relies on nearest-neighbor regression techniques. 


Following [14], an EOF based post-processing step is applied to all the recon- 
structions. Furthermore, in this experiment, we only report the reconstruction 
performance of the mean component as a relevant benchmark of the uncertainty 
of the above data-driven models would be out of the scope of this paper. Thus, 
the model and observation error covariances are assumed to be known matrices 
with appropriate dimensions, and the training of the proposed model is carried with 
yı = l and y = 0. 


Reconstructing Performance of the Proposed Data-Driven Models A quanti- 
tative analysis of the benchmark is given in Table 1 based on (i) a mean RMSE 
criterion and (ii) a mean correlation coefficient criterion of the interpolated fields 
as well as their gradients. The RMSE and correlation coefficient time series, as 
well as the spatial coverage of the observations are also reported in Fig. 2. Overall, 
the proposed end-to-end architecture leads to very significant improvements with 
respect to the state-of-the-art AnDA technique, as well as to its plug-and-play 
counterpart both in terms of RMSE and correlation coefficients. These results 
emphasize the importance of the end-to-end methodology with respect to classical 
plug-and-play techniques since, when considering data-assimilation applications, 
and as shown by [16, 10], the reconstruction performance depends, in addition to the 
quality of the dynamical prior, on the provided measurements and their sampling. 
Classical plug-and-play techniques, in the opposite to end-to-end strategies, ignore 
the latter source of information which explains the performance of our framework. 


Qualitative Analysis of the Proposed Schemes the conclusions of the quantitative 
analysis are also illustrated through the visual analysis of the reconstructed surface 
elevation and its gradient in Fig. 3. Interestingly, this visual analysis reveals that the 
AnDA technique tend to smooth out fine-scale patterns. By contrast, the Kalman 
filter based schemes (in both its end-to-end and plug and play versions) achieve a 
better reproduction of fine scale structures, illustrated for instance by the gradients 
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Fig. 3 Interpolation example of the surface elevation field: first row, the reference surface 
elevation, its gradient and the observation with missing data; second row, interpolation results 
using respectively the plug-and-play Augmented Koopman Kalman filter, AnDA, and the proposed 
E2EKF; third row, gradient of the reconstructed fields 


of the field. The analysis of the spectral signatures in Fig.4 leads to similar 
conclusions since, when compared to the state-of-the-art AnDA technique, as well 
as to its plug and play counterpart, the proposed end-to-end architecture leads 
to significant improvements especially regarding the reproduction of the gradient 
energy-level. 


4 Conclusion 


Spatio-temporal interpolation applications are important in the context of ocean 
surface modeling. For this reason, deriving new data assimilation architectures that 
can perfectly exploit the observations and the current advances in signal processing, 
modeling and artificial intelligence is crucial. In this context, this work investigated 
the ability of augmented linear state space models in solving smoothing issues of 
ocean surface observations using the Kalman filter. 
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Fig. 4 Spectral comparison of the tested models: the averaged power spectral densities and their 
error with respect to the ground truth are given in (a) and (b) respectively 


Beyond filtering and smoothing applications, we believe that the proposed 
framework provides an initial playground for learning approximate linear state 
space models of real observations. Given a sequence of sparse observations, the 
proposed framework may be able to unfold large scale frequencies that are useful 


for 


prediction. Interesting case studies include sea level rise and the increase of the 


anomaly of the sea surface temperature. 
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Dynamical Properties of Weather Regime ® 
Transitions daa 


Paul Platzer, Bertrand Chapron, and Pierre Tandeo 


Abstract Large-scale weather can often be successfully described using a small 
amount of patterns. A statistical description of reanalysed pressure fields identifies 
these recurring patterns with clusters in state-space, also called “regimes”. Recently, 
these weather regimes have been described through instantaneous, local indicators 
of dimension and persistence, borrowed from dynamical systems theory and 
extreme value theory. Using similar indicators and going further, we focus here 
on weather regime transitions. We use 60 years of winter-time sea-level pressure 
reanalysis data centered on the North-Atlantic ocean and western Europe. These 
experiments reveal regime-dependent behaviours of dimension and persistence near 
transitions, although in average one observes an increase of dimension and a 
decrease of persistence near transitions. The effect of transition on persistence is 
stronger and lasts longer than on dimension. These findings confirm the relevance 
of such dynamical indicators for the study of large-scale weather regimes, and reveal 
their potential to be used for both the understanding and detection of weather regime 
transitions. 


Keywords Weather - Regime - Transition - Shift - Dynamical systems - 
Dimension - Persistence 


1 Introduction 


The concept of weather regime was introduced in 1949 by [1]. Broadly speaking, 
weather regimes are recurring, quasi-stationary states of the atmosphere, which 
allow to describe most of the subseasonal variability of atmospheric states, the 
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latter being defined through large-scale maps of either mean sea-level pressure 
or geopotential height. The study of weather regimes has numerous potential 
applications as a tool to understand subseasonal atmospheric dynamics [2]. The 
understanding and correct representation of weather regimes is also paramount for 
adequate climate projections [3]. 

Vautard [4] defines weather regimes through stationarity and searches for 
geopotential fields with a quasi-vanishing time-derivative. Others (see e.g. [5]) 
use cluster analysis (i.e. k-means or Gaussian Mixture Models) to find recurring 
patterns. To perform such analyses, one usually uses a low-order description of the 
atmospheric state, through empirical orthogonal functions (EOFs). Some authors 
simply rely on projection on a low number of EOFs (two in the case of [6]), and 
on forecaster’s empirical knowledge of the recurrence of regimes defined through 
positive and negative phases of dominant EOFs. 

A natural concern is not only the definition of weather regime, but also the study 
of their transition [5]. Statistical tools such as random forest can be used to perform 
such a task [7]. The performance of physics-based weather forecasts can also be 
assessed through their ability to predict weather regime transitions [6]. Our study of 
weather regime transition is noticeably motivated by the relevance and difficulty of 
their forecast. 

We aim to focus on the time-evolution of two dynamical indicators (local 
dimension and persistence) around transitions between winter-time, North-Atlantic 
weather regimes. These indicators are relevant to the study of Atlantic-European 
weather regimes, as each weather regime can be associated with specific values of 
these indicators [8]. From this static study of weather regimes, we carry on with a 
dynamic study of transitions. 

Note, [9] already investigated the temporal behaviour of local dimension and 
persistence at the mature stage of seven regimes, used to define round-year 
sub-seasonal variability of weather over the North-Atlantic and western Europe. 
These mature stages were identified as local minima of the weather regime index 
defined by [10] as the projection of the instantaneous atmospheric state on the 
atmospheric state associated with each regime. Hochman et al. [9] showed that the 
so-defined mature stages of weather regimes coincided with locally low values of the 
dimension and inverse persistence, and that these mature stages were both preceded 
and followed by higher relative values of these indicators. The present paper is 
concerned with weather regime transitions, which are located between weather 
regime mature stages. We therefore expect to confirm the relatively higher values of 
dimension and persistence observed by [9] before and after regime mature stages. 
However, our study could reveal varying behaviours as we focus on transitions from 
one specific regime to another, while the study of [9] does not specify which regime 
precedes or follows a given mature stage. 

Our analysis also bears similarity with the one of [11], in which the temporal 
behaviour of local dimension and persistence during Eastern Mediterranean cold 
spells was examined. The main difference with the present study is the nature of the 
event of interest: we are interested in transitions between weather regimes, while 
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cold spells could be viewed as a special type of weather regime (a particular case 
of Cyprus Lows which is the dominant regime responsible for precipitation in the 
Eastern Mediterranean region). 

The next section is the core of our paper and reviews the results of our study, 
describing salient features of the time-evolution of dimension and persistence 
around transitions between four winter-time North-Atlantic weather regimes. The 
following section draws perspectives and proposes potential applications to real- 
world meteorological issues. Appendix sections provide details to the tools and data 
used in the present study. 


2 European-Atlantic Weather Regime Transitions 


An EOF-decomposition is performed (see section “Empirical Orthogonal Func- 
tions”) of winter-time, reanalysed sea-level pressure fields described in Appendix 1. 
A weather-regime analysis follows using a Gaussian Mixture Model with four 
modes, corresponding to four weather regimes, in a reduced-space spanned by the 
three first EOFs (see section “Gaussian Mixture Model” for a discussion). The 
resulting regimes are shown in Fig. 1 in EOF space and there centroids are shown in 
Fig. 2 as SLP-anomaly maps. 

Figure | illustrates that the four regimes are mostly defined through EOF1 
and EOF2, as the centroids’ EOF3-coordinates are close to zero. Two regimes 
are associated with positive-negative phases of the first EOF, corresponding to a 
strong north-south pressure gradient (see Fig. 2), and we label these regimes NAO+ 
and NAO-— to match previous works in the litterature. The two other regimes are 
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Fig. 1 Weather regimes as cluster distributions from the fit of a Gaussian Mixture Model to winter- 
time sea-level-pressure anomaly (SLPa) from reanalysis data. The fit is performed in reduced 
space through projection of SLPa maps on three leading empirical orthogonal functions (EOF). 
Colored contours show the 0.750 (thick lines) and 1.250 (thin lines) ellipses of each distribution 
around their centroids, with o denoting standard deviation. Grey contours show the whole GMM 
distribution through marginal distributions in two-dimenisonal EOF-subspaces. Regime names are 
assigned from comparison with other scientific studies found in the litterature (see Fig. 2) 
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Fig. 2 Weather regimes as sea-level-pressure anomalies in (longitude, latitude) coordinates 
(coastlines are shown), defined by the distributions’ centroids from a Gaussian Mixture Model 
(see Fig. 1 and section “Gaussian Mixture Model”). Regime names are assigned from comparison 
with other scientific studies found in the literature 


associated with a pressure system covering western Europe and extending far-off 
Europe’s west-coast. The regime corresponding to an anticyclonic situation over 
western Europe is termed BLO+, and its opposite phase is termed BLO-, in 
accordance with previous studies on such regimes. Note that the small contribution 
of EOF3 to the definition of BLO+ and BLO— induces a slight west-ward shift of 
the BLO— pressure system compared to the one of BLO+. 

Then, we follow [5] and assign each SLP-anomaly field to a weather regime if 
it lies inside the 1.250 ellipses, shown in Fig. 1 (in cases of points belonging to 
two regimes, we assign the regime with highest probability), otherwise no regime 
is assigned. Next, for any regimes “A” and “B”, a transition from regime “A” 
to regime “B” is defined as either the consecutive passing from “A” to “B” or 
the consecutive passing from “A” to “no regime” and then to “B” (note that this 
allows transitions from a regime to itself). As we are interested in the behaviour of 
dynamical indicators around transitions, we discard transitions of the type “A’— “no 
regime” “B” if the “no regime” phase exceeds 24h. 
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3 Dimensionality Around Transitions 


The local dimension of sea-level pressure fields is used as an indicator of the state 
of the atmosphere. Details on this indicator and how is it computed can be found in 
section “Local Dimensions”. 

In Fig. 3, one observes statistics of dimension-versus-time profiles centered on 
transitions. The number of transitions on which the statistics were computed is also 
mentioned, showing preferred transitions in agreement with [5]. Several behaviours 
can be observed. 
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|. A: ' -—} © 12 i2 o 1 2 3 
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Fig. 3 Typical profiles of local dimension versus time, centered at transition point, for each 
possible transitions. Light (resp. dark) greys fill between the 0.05 and 0.95 (resp. 0.25 and 0.75) 
quantiles, while the dark lines show the average dimension profile around transition from regime 
“A” to regime “B”. In red, statistics over each regime (with no restriction to transitions) are shown. 
Red dotted (resp. dashed) lines show the 0.05 and 0.95 (resp. 0.25 and 0.75) quantiles, while the 
full red lines show the average dimension of regime “A” and “B” 
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Smooth transition The transition BLO+ — NAO-— shows a smooth transition 
from the dimension statistics of regime BLO-+ to the statistics of regime NAO— 
over a transition period of ~1 day, starting after the transition, with no particular 
behaviour at the transition itself. 

Dimension overshoot Right before, after, or during transitions NAO- —> 
BLO-—, BLO— —> BLO+, BLO+ — BLO-, and BLO+ —>NAO+, the local 
dimension statistics exceed what is expected from statistics computed over each 
regime. Transitions BLO— — BLO+ and BLO+ — BLO- show the highest 
intensity of dimension overshoot (around +1 in dimension), with the average 
dimension near transition (black, full) reaching the 0.75 quantile of the regime 
distributions (dashed, red). For transition BLO— — BLO-4-, the overshoot occurs 
~1 day after the transition, while for BLO+ —> BLO-— it occurs 1 day before. 
In both cases, transition-statistics (black, grey) are very similar to the BLO— 
regime-statistics (red), while the overshoot occurs in the BLO+ phase, and is 
preceded or followed by an undershoot. 

Time-symmetry From the previous description, it appears that the dimension 
statistics around transition BLO— —BLO+ are almost symmetric to 
BLO+ —BLO-- the latter can be recovered from taking the former in reverse- 
time. Similar types of symmetry can be observed in transitions BLO+ <NAO+, 
BLO— <NAO+, and BLO+ <NAO-, although with less confidence. 

Time-asymmetry On the other hand, the transition NAO— —BLO-— shows a 
slight overshoot of dimension statistics at the transition point while the transition 
BLO— -—NAO- shows an overshoot of dimension statistics away from the 
transition point (~2 days before and after). 


Auto-transitions are harder to interpret than normal transitions. They correspond 
to trajectories in phase-space where the system goes from a well-defined regime 
to a mixed, undefined regime, and then comes back to the initial well-defined 
regime. It is likely that these auto-transitions actually mix different types of 
transient behaviours, with different properties. Auto-transition NAO+ —>NAO+ 
seems to show an overshoot of dimension near the transition point, but the number 
of transitions (57) is small and therefore only low confidence is attributed to 
these statistics. Other auto-transition statistics are rather smooth and close to the 
corresponding regime-statistics, which might be due to the fact that auto-transitions 
mix different types of transient behaviours. 

Figure 5b shows dimension statistics for all transitions, excluding auto- 
transitions. It shows a slight dimension overshoot at the transition point +1 day. 
The fact that this overshoot is so small is an indicator of the variety of behaviours 
near transition, depending on which regimes are involved. 


Regime Transition Dynamics 229 
4 Persistence Around Transitions 


We now use the inverse persistence 0 (also called extremal index) of sea-level 
pressure fields as an indicator of the state of the atmosphere. Details on this indicator 
and how is it computed can be found in section “Inverse Persistence 0”. 

In Fig.4, we show the result of the same procedure followed in the previous 
section, but replacing the local dimension by the inverse persistence. As these two 
variables are correlated, the behaviour of inverse persistence resembles the one 
of dimension around much of the observed transitions. However, the difference 
between transition-statistics and regime-statistics appear to be more significant for 
0 than for the dimension, with some special behaviours described below. 
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Fig. 4 Same as Fig. 3, but for the inverse persistence 0 (also called extremal index). High values 
indicate a rapidly changing dynamical system 
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Transitions to/from BLO+ The BLO+ regime-statistics of 6 are much higher 


than the ones of other regimes, with most values concentrated between 0.17 and 
0.19, and almost all values above 0.16. We therefore see high variations of 0 
around transitions from or to BLO+. However, when one is in the regime BLO+, 
either after or before a transition, we do not observe an overshoot as with the 
dimension. Rather, we see that the transition-statistics match the BLO-+ statistics 
very near the transition point, while they are much lower 2-3 days away from the 
transition. This means that, in the regime BLO-+, the inverse persistence is much 
lower either 2-3 days before or 2—3 days after any transition. Also, the values 
of 0 in regimes NAO+ and BLO-, up to at least three days around a transition 
from or to BLO+, are much higher than expected from intra-regime statistics. 
We can interpret these fact using the results of [9] who observed a strong 
decrease of 0 when weather regimes are well-installed. Therefore, what we see in 
Figs. 3d, h, 1, m—p and 4d, h, 1, m—p indicates that the systems rapidly exits/enters 
regime BLO-+, while it needs more time to exit/enter neighbouring regimes when 
transitioning from or to BLO+. 


BLO-<~NAO+ Although the NAO+ and BLO-— intra-regime statistics of 0 are 


significantly different, BLO— <*NAO-+ transition-statistics of 0 are relatively 
smooth in time, showing very few variations, and closer to the NAO+ intra- 
regime statistics. Again, this can be interpreted as a slow transition. 


Low-quantiles overshoot From Fig.5, one can see that, while all quantiles 


of dimension seem to be affected equally around transitions (Fig.5b), it is 
mostly the low quantiles of inverse persistence which are affected by transitions 
(Fig. 5a). That is, values of 0 are not expected to be especially large near 
transitions (compared to average statistics), but small values of 0 are expected 
to be extremely unlikely around transitions. 
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Fig. 5 In grey: statistics (0.05, 0.25, 0.75 and 0.95 quantiles, as well as mean) of inverse 
persistence (a) and local dimension (b) over all transitions, discarding auto-transitions (from 
regime “A” to “A’). In red: statistics (0.05, 0.25, 0.75 and 0.95 quantiles, as well as mean) over 
all values from the dataset (winter-time from 1956 to 2015), without restriction to transitions 
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Already mentioned earlier, we discard transitions “A— B” if the “no regime” 
phase between regimes “A” and “B” exceeds 24h. Raising the maximum length 
of this “no regime” phase allows to find more transitions, and results in a slight 
smoothing of the profiles of Figs.3 and 5, but the observed tendencies remain. 
Reducing the maximum length of the “no regime” phase between regimes “A” and 
“B” results in slightly sharper, yet noisier profiles (not shown). 


5 Conclusion and Perspectives 


The analysis of reanalysed sea-level pressure maps covering a large part of 
the North-Atlantic ocean and western Europe, demonstrates that local dynamical 
indicators of dimension and persistence display great sensitivity to transitions 
between weather regimes. In particular, we observe higher values of dimension and 
lower values of persistence near transitions, which is in agreement both with the 
early definition of weather regimes (as quasi-stationary, low-order recurring states) 
and with recent studies of weather regimes through these same two dynamical 
indicators. The study reveals non-homogeneous behaviour of these indicators near 
transitions, meaning that different transition show different signatures in terms 
of time-variation of dimension and persistence. Furthermore, we observe that the 
fingerprint of transitions is more pronounced for persistence than for dimension, 
and that it spreads over a larger duration (more than +3 days for persistence but 
around +1.5 day for dimension). 

This study, combined with recent studies on weather regimes and dynamical 
indicators, confirm the relevance of these indicators for the understanding of weather 
regimes, and even reveal the potential for these indicators to be used in the definition 
of weather regimes. Present findings also indicate that each transition could be 
identified through the time-behaviour of dimension and persistence. This has great 
implications and shall motivate further investigations on how to use these indicators 
for the purpose of detecting regime transitions. However, for each transition we 
still observe a great variability of time-profiles of dimension and persistence. This 
suggests to use a variety of related indicators, and not only these two. Recent studies 
have used these indicators on separated scales, allowing to explore variations in 
dimensionality and persistence of small-scale variables [23]. Our current analyses 
also reveal a signature of large-scale weather regime transitions in the time-variation 
of small-scale dimension and persistence, however with less intensity than for large- 
scale dynamical indicators (not shown). We interpret this as a hint that small-scale 
organization may be necessary to large-scale transitions. Other local indicators also 
based on analogues such as the ones used by [24] and [25] shall also be considered 
in an attempt to predict transitions. 
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Appendix 1: Data Description: Twentieth Century Reanalysis 


We use data from the 3rd version of the twentieth Century Reanalysis, which 
combines surface observations of synoptic pressure and NOAA’s Global Forecast 
System, and prescribes sea surface temperature and sea ice distribution [12]. 

From this reanalysis we extract the ensemble-mean, sea-Level pressure maps 
from year 1956 to 2015, at 3h-intervals. We do not use preceding years in order to 
avoid inconsistency between past, observation-scarce data, and more recent data, 
better constrained by observations. We could also have selected only data from 
the satellite era starting in 1979, but this would have diminished the statistical 
significance of our work. 

We focus on a 41x41 grid at 1°-resolution covering longitudes 30W <LON<10E 
and latitudes 30N<LAT<7O0N, including western Europe and the eastern part of the 
North-Atlantic Ocean (see Fig. 2). We use only extended-winter data, from October 
to March, as is typical in North-Atlantic weather-regime studies (see e.g., [9, 6, 8]). 


Appendix 2: Statistical Descriptors 


Empirical Orthogonal Functions 


To study winter-time SLP fields, we use the empirical orthogonal function decom- 
position, also called principal component analysis [13]. It allows to decompose 
any spatial field (snapshot) of SLP-anomaly (SLPa) onto orthogonal maps (EOFs), 
ordered by their respective contribution to the total variability in time of SLPa fields. 
To compute SLPa, we remove a moving seasonal-average using data from +10 years 
and +5 calendar-days, with a Gaussian kernel to give more weight to neighbouring 
years and calendar days. 

In our case, EOFs n° 1-7 contribute respectively to 41%, 24%, 14%, 5.5%, 4.8%, 
2.2% and 1.5% of the total signal variance. No that, for our analyses of weather 
regimes, we use only EOFs n° 1-3, which contribute collectively to 79% of the total 
variance. 
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Gaussian Mixture Model 


A Gaussian Mixture Model (GMM) assumes that the random variable it describes is 
the result of pooling from a finite number of sub-populations (in our case, regimes) 
whose distributions are Gaussian [14]. Expectation-maximization (EM) allows to 
find optimal parameters (averages and covariances) of the Gaussian distributions, 
once the number of regimes has been fixed. 

We follow [5], and make a GMM EM-fit using a finite number of EOFs. As 
we allow the covariances to have any possible shape, the number of parameters 
to be optimized depends exponentially on the number of EOFs kept, we therefore 
have not tried using more than 5 EOFs. Then, once the number of EOFs is fixed, 
a trade-off between the number of parameters (dictated by the number of regimes) 
and the model adequacy to the data can be found by computing either the Bayesian 
Information Criterion or the average log-likelihood over an independent set [16]. 
However, as in the study by [5], we find a very low sensitivity of these indicators to 
the number of regimes chosen (not shown). We also compute the Silhouette score 
proposed by [15] to estimate the degree of overlapping between regimes, and find 
that using more EOFs always leads to more overlapping, and so does using more 
regimes but to a lesser extent (not shown). 

In the end, we make the choice of keeping 3 EOFs and 4 regimes. The choice of 3 
EOFs is motivated by the fact that each of the three first EOFs account for more than 
10% of the total variance, while EOFs n°4 and further only represent up to ~5%. 
This has the consequence that, even when we retain more than 3 EOFs, the regime 
centroids found through GMM EM-fits are mostly defined by their projection on 
the 3 first EOFs, as projections on EOFs 4 and 5 are always closer to O then one 
of the other projections (not shown). The choice of 4 regimes is motivated by the 
adequacy with other studies [6] and operational weather-forecasting services such 
as ECMWF who divide into 4 quadrants the reduced-space formed by the projection 
of geopotential height fields onto their corresponding first-2 EOFs. 


Appendix 3: Dynamical Indicators 


Local Dimensions 


We use the same estimator of local dimension as [8], borrowing the python code 
from the Chaotic Dynamical Systems Kit (https://github.com/yrobink/CDSK). This 
estimator is based on a definition of local dimension at any point z in state-space 
through the extreme-value distribution of the observable g, : x —> g(x) = 
— log dist(z, x) for any other state-space vector x (where “dist” is any distance in 
the mathematical sense). Large values of this observable are found for points x 
close to z: these points are called “analogues” of z in the atmospheric- and ocean- 
sciences community. Then, the probability that g(x) exceeds a given threshold p is 
exponential (see, for instance, [17]): 
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P (g-(x) > p) x exp(—pd(z)) , (A.1) 


where d(z) is the local-dimension that we estimate here. The geometric interpre- 
tation of this dimension is that in a space of dimension d, the typical number of 
points inside a sphere of radius r scales as r?. Although such an interpretation of 
dimension has been connected to the distances to analogues for a long time (see 
for instance [18] and the famous Grassberger-Proccacia algorithm [19]), only recent 
works have used extreme-value theory to provide instantaneous, local estimators 
of dimension [20]. These recent tools are particularly suited for the study of local 
behaviours, while previous works focused on average, global indicators. 

Recently, distances between analogues x and their target z have been shown 
to follow distributions whose parameters are given by the length of the available 
dataset, the analogue rank, and the local dimension as estimated in this paper [21]. 
This indicator is thus both relevant from a dynamical systems point of view and for 
practical use of data-based methods. 


Inverse Persistence 6 


However, Eq. A.1 is not valid when the system passes close to a fixed point, as this 
causes trajectories to slow down. In this case, another parameter called the extremal 
index, or inverse persistence, comes into play: 


P (g-(x) > p) x exp(—p O(z)d(z)) , (A.2) 


with 0 < 6@(z) < 1. Low values of @ correspond to highly persistent areas of 
state-space. It can be interpreted as the inverse mean residence time within a sphere 
centered on z (if divided by the time-increment between two consecutive points in 
the dataset, which is 3h in our case). We estimate this parameter with the Siiveges 
likelihood estimator [22]. It is based on counting consecutive points inside a ball 
centered on z (i.e., analogues of the same point z that are also consecutive points in 
the time-ordered dataset). 
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Frequentist Perspective on Robust A 
Parameter Estimation Using the cree | 
Ensemble Kalman Filter 


Sebastian Reich 


Abstract Standard maximum likelihood or Bayesian approaches to parameter 
estimation for stochastic differential equations are not robust to perturbations in 
the continuous-in-time data. In this paper, we give a rather elementary explanation 
of this observation in the context of continuous-time parameter estimation using an 
ensemble Kalman filter. We employ the frequentist perspective to shed new light 
on two robust estimation techniques; namely subsampling the data and rough path 
corrections. We illustrate our findings through a simple numerical experiment. 


Keywords Parameter estimation - Stochastic differential equations - Ensemble 
Kalman filter - Frequentist approach - Rough path theory 


1 Introduction 


In this note, we consider the well-studied problem of parameter estimation for 
stochastic differential equations (SDEs) from continuous-time observations X t ,te 
[0, T] [25]. It is well-known that the corresponding maximum likelihood estimator 
does not depend continuously on the observations X t t € [0, T], which can result 
in a systematic estimation bias [27, 14]. In other words, the maximum likelihood 
estimator is not robust with respect to perturbations in the observations. Here, we 
revisit this problem from the perspective of online (time-continuous) parameter 
estimation [6, 11] using the popular ensemble Kalman filter (EnKF) and its 
continuous-time ensemble Kalman-Bucy filter (EnKBF) formulations [15, 10, 26]. 
As for the corresponding maximum likelihood approaches, the EnKBF does not 
depend continuously on the incoming observations X Í, t > 0, with respect to 
the uniform norm topology on the space of continuous functions. This fact has 
been first investigated in [9] using rough path theory [16]. In particular, as already 
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demonstrated for the related maximum likelihood estimator in [14], rough path 
theory allows one to specify an appropriately generalised topology which leads to 
a continuous dependence of the EnKBF estimators on the observations. Here we 
expand the analysis of [9] to a frequentist analysis of the EnKBF in the spirit of [29], 
where the primary focus is on the expected behaviour of the EnKBF estimators over 
all admissible observation paths. One recovers that the discontinuous dependence 
of the EnKBF estimators on the driving observations results in a systematic bias 
from a frequentist perspective. This is also a well known fact for SDEs driven by 
multiplicative noise [23]. 

The proposed frequentist perspective naturally enables the study of known bias 
correction methods, such as subsampling the data [27], as well as novel de-biasing 
approaches in the context of the EnKBF. 

In order to facilitate a rather elementary mathematical analysis, we consider 
only the very much simplified problem of parameter estimation for linear SDEs. 
This restriction allows us to avoid certain technicalities from rough path theory and 
enables a rather straightforward application of the numerical rough path approach 
put forward in [13]. As a result we are able to demonstrate that the popular 
approach of subsampling the data [2, 27, 5] can be well justified from a frequentist 
perspective. The frequentist perspective also suggests a rather natural approach to 
the estimation of the required correction term in the case an EnKBF is implemented 
without subsampling. 

We end this introductory paragraph with a reference to [1], which includes a 
broad survey on alternative estimation techniques. We also point to [9] for an in- 
depth discussion of rough path theory in connection to filtering and parameter 
estimation. 

The remainder of this paper is structured as follows. The problem setting and the 
EnKBF are introduced in the subsequent Sect. 2. The frequentist perspective and its 
implications on the specific implementations of an EnKBF in the context of low 
and high frequency data assimilation are laid out in Sect. 3. The importance of these 
considerations becomes transparent when applying the EnKBF to perturbed data 
in Sect. 4. Here again, we restrict attention to a rather simple model setting taken 
from [17] and also used in [9]. As a result we build a clear connection between 
subsampling and the necessity for a correction term in the case high frequency data 
is assimilated directly. A brief numerical demonstration is provided in Sect. 5, which 
is followed by a concluding remark in Sect. 6. 


2 Ensemble Kalman Parameter Estimation 


We consider the SDE parameter estimation problem 
dX, = f (X+, 0)dt + yaw, (1) 


subject to observations X t ,t € [0, T], which arise from the reference system 
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dx} = flexPdrt yawi, (2) 


where the unknown drift function f'(x) typically satisfies ft (x) = f(x, @") and 0t 
denotes the true parameter value. Here we assume for simplicity that the unknown 
parameter is scalar-valued and that the state variable is d-dimensional with d > 
1. Furthermore, W, and wj denote independent standard d-dimensional Brownian 
motions and y > 0 is the (known) diffusion constant. 

Following the Bayesian paradigm, we treat the unknown parameter as a random 
variable ©. Furthermore, we apply a sequential approach and update © with 
the incoming data xi as a function of time. Hence we introduce the random 
variable ©; which obeys the Bayesian posterior distribution given all observations 
xt, t € [0,f], up to time ¢ > 0. Furthermore, instead of exactly solving 
the time-continuous Bayesian inference problem as specified by the associated 
Kushner-—Stratonovitch equation [6, 26], we define the time evolution of ©; by 
an application of the (deterministic) ensemble Kalman—Bucy filter (EnKBF) mean- 
field equations [10, 26], which take the form 


do, = y~'m, | 0 — m10) 8 FOG, 9)| dh, (3a) 
+1 i i 
dr = aX} — 5 (S, O) +01 fl, 6)1) ae, (3b) 


where 7z; denotes the probability density function (PDF) of ©, and z;[g] the 
associated expectation value of a function g(@). The column vector /;, defined by 
(3b), is called the innovation, while the row vector 


Kin) = y'm [6 — ml) @ f.o], (4) 


premultiplying the innovation in (3a) is called the gain. Here the notation a ® b = 
ab, where a, b can be any two column vectors, has been used. The initial condition 
Oo ~ xo is provided by the prior PDF of the unknown parameter. 

A Monte-Carlo implementation of the mean-field equations (3) leads to the 
interacting particle system 


dof = ya [6 -— aM Io) @ f}, oar, Sa) 
, 1 ; 
ay, = ax} - 5 (Ft, OD) + aM, 6)1) dt, (5b) 
i = 1,...,M, where expectations are now taken with respect to the empirical 


measure. That is, 


ies i 
mig] = =) 8P) (6) 
i=l 
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for given function g(@), and all Monte-Carlo samples are driven by the same (fixed) 
observations X t . The initial samples oË, i = 1,..., M, are drawn identically and 
independently from the prior distribution 7o. 

We note in passing that there is also a stochastic variant of the innovation process 
[26] defined by 


dl, = dx} — f (Xİ, @,)dt — yaw, (7) 
which leads to the Monte-Carlo approximation 
dis? = dx} — f(x}, oP dt — yaw,” (8) 


of the innovation in (5). 


Remark I There is an intriguing connection to the stochastic gradient descent 
approach to the estimation of 6, as proposed in [30], which is written as 


Qt + ~ 
do, = Ta , Ddl, (9a) 
di, = dx! — f (XÏ, 6dr (9b) 


in our notation, where a; > O denotes the learning rate. We note that (9) shares 
with (3) the gain times innovation structure. However, while (3) approximates 
the Bayesian inference problem, formulation (9) treats the parameter estimation 
problem from an optimisation perspective. Both formulations share, however, the 
discontinuous dependence on the observation path X a and the proposed frequentist 
analysis of the EnKBF (3) also applies in simplified form to (9). We also point 
out that (3) is affine invariant [18] and does not require the computation of partial 
derivatives. 


We now state a numerical implementation with step-size At > 0 and denote the 
resulting numerical approximations at t = nAt by On ~ m,n > 1. While 
a standard Euler-Maruyama approximation could be applied, the following stable 
discrete-time mean-field formulation of the EnKBF 


Ont = On+Kn {x = xt) E (SO, On) + mat fxd, on) au} 


tn+1 
(10) 
is inspired by [3] with Kalman gain 
Ky = Tn | — ml) ® F, 0) x (11a) 


(y + Aton (F0 — mL}, 1) @ FO4,0]) aw 


Frequentist Perspective on Estimation Using the EnKF 241 


It is straightforward to combine this time discretisation with the Monte-Carlo 
approximation (5) in order to obtain a complete numerical implementation of the 
EnKBF. 


Remark 2 The rough path analysis of the EnKBF presented in [9] is based on a 
Stratonovich reformulation of (3) and its appropriate time discretisation. Here we 
follow the It6/Euler-Maruyama formulation of the data-driven term in (3), 


T L 
t jj : 7 t i 
X;,t)dX, = lim ) Xi stn) (X X 12 
f g( ) dx; rr 04 gee tn n)( ane tn) (12) 


for any continuous function g(x, t) and At = T/L, as it corresponds to standard 
implementation of the EnKBF and is easier to analyse in the context of this paper. 


The EnKBF provides only an approximate solution to the Bayesian inference 
problem for general nonlinear f(x, 0). However, it becomes exact in the mean-field 
limit for affine drift functions f(x,0) = 0Ax + Bx +c. 


Example I Consider the stochastic partial differential equation 
du = —Udyu + pðu +W (13) 


over a periodic spatial domain y € [0, L), where W(t, y) denotes space-time 
white noise, U € R, and p > 0 are given parameters. A standard finite-difference 
discretisation in space with d grid points and mesh-size Ay leads to a linear system 
of SDEs of the form 


du, = —(UD + pDD")u,dt + Ay~!/*dW,, (14) 
where u; € R? denotes the vector of grid approximations at time t, D € R¢*¢ 
a finite difference approximation of the spatial derivative 0,, and W; the standard 
d-dimensional Brownian motion. We can now set X; = w, y = Ay! and identify 
either 0 = U or 0 = p as the unknown parameter in order to obtain an SDE of the 
form (1). 


In this note, we further simplify our given inference problem to the case 


f(x, 0) =0Ax, (15) 


where A € R¢*¢ is a normal matrix with eigenvalues in the left half plane. That is 
o(A) C C_. The reference parameter value is set to 8t = 1. Hence the SDE (2) 
possesses a Gaussian invariant measure with mean zero and covariance matrix 


C=-y(A +AT !. (16) 
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We assume from now on that the observations X t are realisations of (2) with initial 
condition x ~ N(0, C). 
Under these assumptions, the EnKBF (3) simplifies drastically, and we obtain 


do, = (AXTAR, (17a) 
y 
1 
dI, = dx} — 5 (Or + m4) AX! dt, (17b) 
with variance 
o = m | (0 - loD]. (18) 


Remark 3 For completeness, we state the corresponding formulation for the 
stochastic gradient descent approach (9): 


a9, = “axty Tak, (19a) 
y 
di, = dX} — 6, AX} dt. (19b) 


We find that the learning rate œ; takes the role of the variance o; in (17). However, 
we emphasise again that the same pathwise stochastic integrals arise from both 
formulations, and therefore, the same robustness issue of the resulting estimators 
6,, t > O, arises. 


Similarly, the discrete-time mean-field EnKBF (10) reduces to 


i z 1 i 
Ont1 = On + Kn {ha — Xi) =a (On + 7[0]) axar (20) 
with Kalman gain 
.\—l 
Kn = on (A X} )" (y n Aton (AXÌ TAX} ) , (21) 


Furthermore, since X t ~ N(0, C), 
(AX})TAX! = (ATA) : (X} @ X}) ~ (ATA): C (22) 
for d > 1, and we may simplify the Kalman gain to 


—1 
Kn = on (AX} )! (y + Aton (ATA) : c) i (23) 
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Here we have used the notation A : B = tr(ATB) to denote the Frobenius inner 
product of two matrices A, B € R¢*4. The approximation (22) becomes exact in 
the limit d —> oo, which we will frequently assume in the following section. Please 
note that 

K, = On 


ars (AX})' + O(t) (24) 


under the stated assumptions. 


Remark 4 The Stratonovitch reformulation of (17) replaces (17a) by 
dO, = Ž fax" oak Zir (A) ar , (25) 
Y 


The innovation /; remains as before. See Appendix B of [9] for more details. An 
appropriate time discretisation of the innovation-driven term replaces the Kalman 
gain (21) by 


r —1 
Knt1/2 =n AX ap)" (Y + Aton(AX gap) AX an) > (26) 
where 
i | yt i 
Xn = Xn a Mey) s (27) 


Please note that a midpoint discretisation of the data-driven term in (25) results in 


(AX aD AX a — X = AXTA a — Xj) + (28a) 
a : (Xf -Xlo (Xi, X) (28b) 

and that 
sat : (Xi,, — Xi) @ (X},, — Xi) © AY a (A), (29) 


which justifies the additional drift term in (25). A precise meaning of the approxi- 
mation in (29) will be given in Remark 5 below. 


Alternatively, if one wishes to explicitly utilise the availability of continuous-time 
data X ; , one could apply the following variant of (20): 


On tn+1 x ‘ 1 

a Tayt t 

On+1 = On + y i (AX;) dX, E 5 Kn AX, (On + Tn[O]) At, (30) 
tn 
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and following the It6/Euler-Maruyama approximation (12), discretise the integral 
with a small inner step-size At = At/L, L > 1; that is, 


L-1 


tn+1 Pi $ i 
[U arag a aTa, -x 61) 
în 1=0 
with t = tn +/At. We note that 
L-1 
Yax TAL — XE) = (AX) (X],,, — Xi) + (32a) 
1=0 


L-1 

a. (Sed —xbea,, x), 
1=0 

(32b) 


which is at the heart of rough path analysis [13] and which we utilise in the following 
section. 


3 Frequentist Analysis 


It is well-known that the second-order contribution in (32) leads to a discontinuous 
dependence of the integral on the observed X t in the uniform norm topology on the 
space of continuous functions. Rough path theory fixes this problem by defining 
appropriately extended topologies and has been extended to the EnKBF in [9]. 
In this section, we complement the path-wise analysis from [9] by an analysis 
of the impact of second-order contribution on the EnKBF (17) from a frequentist 
perspective, which analyses the behaviour of EnKBF over all possible observations 
X t subject to (2). In other words, one switches from a strong solution concept to 
a weak one. While we assume that the observations satisfy (2), throughout this 
section, we will analyse the impact of a perturbed observation process on the EnKBF 
in Sect. 4. 

We first derive evolution equations for the conditional mean and variance under 
the assumption that @o is Gaussian distributed with given prior mean Mprior and 
variance Oprior- It follows directly from (17) that the conditional mean u; = 7;[0], 
that is the mean of ©,, satisfies the SDE 

Or 


du =o ((AXp)"dX} — m (ATA): (XP ® XP) dt), (33) 
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which simplifies to 
O; + 
dip =Z (axi yTaxt — m (ATA): C dr) (34) 
Y 


under the approximation (22). The initial condition is 449 = mprior. The evolution 
equation for the conditional variance, that is the variance of @;, is given by 


d 2 
Lo, = —2 (ATA) : (x1 @ XÙ) (35) 
dt y 


with initial condition 09 = Oprior and which again reduces to 


d of T 
—o; = —— (A A): C (36) 
dt y 

under the approximation (22). 

We now perform a frequentist analysis of the estimator u, defined by (34) and 
(36), that is, we perform a weak analysis of the SDE (34) in terms of the first 
two moments of Ltr [29]. In the first step, we take the expectation of (34) over all 
realisations X i of the SDE (2), which we denote by 


m; := at [u]. (37) 


The associated evolution equation is given by 


d a 5 ; 
Í m = Č (41A) : EŻ |x; 2 xi | — 2 (ATA) : Cm, (38) 
dt y y 
which reduces to 
d 
q” = (ATA): C (1 — mi) = o (ATA) : (A + AD! (1 —m). (39) 


In the second step, we also look at the frequentist variance 


pi = E'u — mp’). (40) 
Using 
dip mys . {(ATA) . (xi @xi- c) dt + y'?(ax}ytaw; | — (41a) 


Ot T 
rice A): C (w — my)dt, (41b) 
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we obtain 


d Or T 
ae = E (A A): C (2p: —-0:) + (42a) 


2 ; 
Lawm aox- Ogum], aw 


which we simplify to 


d p= 2 (ATA): C (0; = 2p) = 0 (ATA): (A+ ADT! (o 2p) (43) 
di Pt= y : Ot Pt) = Ot : Ot Pt 

under the approximation (22). The initial conditions are mọ = Mprior and po = 
0, respectively. We note that the differential equations (36) and (43) are explicitly 
solvable. For example, it holds that 


~ 14 (ATA): (AT + A)—! oot 


(44) 


Ot 


and one finds that o, ~ 1/((ATA) : (AT + A)™! t) for t > 1. It can also be shown 
that p; < o; for all t > 0. Furthermore, this analysis suggests that the learning rate 
in the stochastic gradient descent formulation (19) should be chosen as 


4, te 1 
a, = min fa, ra areas} (45) 


where a > 0 denotes an initial learning rate; for example & = oo. 

We finally conduct a formal analysis of the ensemble Kalman filter time-stepping 
(20) and demonstrate that the method is first-order accurate with regard to the 
implied frequentist mean m;. We recall (24) and conclude from (20) that the implied 
update on the variance op satisfies 


2 
ae E - (ATA): CAt + O(AP), (46) 


which provides a first-order approximation to (36). 
We next analyse the evolution equation (34) for the conditional mean us and its 
numerical approximation 


Un+1 = HUn + Kn a D x}) = Un AX}, Ar} (47) 


arising from (20). Here we follow [13] in order to analyse the impact of the data X t 
on the estimator. An in-depth theoretical treatment can be found in [9]. 
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Comparing (47) to (34) and utilising (24), we find that the key quantity of interest 
is 


tn 
Dra =l (Ax) Tax}, (48) 
th 
which we can rewrite as 
TF ey = At : (Xi & Xima) a At : Xi a n (49) 


Here, motivated by (32) and following standard rough path notation, we have used 


Ț : f ji 
X ra = Xing > Xr (50) 
and the second-order iterated Itô integral 
+ t+ + + + 
Xpt (= 1 (X; — X,,) @dX,. (51) 
th 


The difference between the integral (48) and its corresponding approximation in 
(47) is provided by AT : X p plus higher-order terms arising from (24). 
The iterated integral XÍ a becomes a random variable from the frequentist 
perspective. Taking note of (2), we find that the drift, f(x) = Ax, contributes 


with terms of order O(4t°) to x. tı and the expected value of xT tn41 therefore 
satisfies 
EEX) na] = O(t), (52) 


since TW = Ofort > t, and 


gW; 


notntl 


cae ene 
iwi -E'(W; 


ore = 2 


t wt At 
-Wi Winn- SL=O 63 


tn+1 
where we have introduced the commutator 


(Wi. Wi pal = Wi 8 Wh na — Wina O Wi. (54) 


inti] sÍn+1 


Hence we find that, while (47) is not a first-order (strong) approximation of the SDE 
(34), the approximation becomes first-order in m; when averaged over realisations 
X t of the SDE (2). More precisely, one obtains 


] = (ATA): CAt + O(A?’). (55) 


MEAR 


tn+1 
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We note that the modified scheme (30) leads to the same time evolution in the 
variance op while the update in un is changed to 


On [ty Tayt t 
HUn+1 = Hn + m (AX,) dX, = Ky, AX, bn At. (56) 
tn 


This modification results in a more accurate evolution in the conditional mean 
Un, but because of (52) it does not impact to leading order the evolution of the 
underlying frequentist mean, mn = E'[un]. We summarise our findings in the 
following proposition. 


Proposition 1 The discrete-time EnKBF implementations (20) and (30) both pro- 
vide first-order approximations to the time evolution of the frequentist mean, mr, 
and the frequentist variance, pı. In other words, both methods converge weakly 
with order one. 


We also note that the frequentist uncertainty is essentially data-independent and 
depends only on the time window [0, T] over which the data gets observed. Hence, 
for fixed observation interval [0, T], it makes sense to choose the step-size At 
such that the discretisation error (bias) remains on the same order of magnitude 
as p¥ ? x o! E Selecting a much smaller step-size would not significantly reduce 
the frequentist estimation error in the conditional estimator ur. 


Remark 5 We can now give a precise reformulation of the approximation (29): 


Aty 
2 


TAT: (Xho toss D Xb tar) | = ote (A) + O(AP), (57) 


1 
2 


which is at the heart of the Stratonovich formulation (25) of the EnKFB [9]. 


4 Multi-Scale Data 


We now have all the material in place to study the dependency of the EnKBF 
estimator on a set of observations X (9, € > 0, which approach the theoretical X t 
with respect to the uniform norm topology on the space of continuous functions as 
€ — 0. Since the second-order contribution in (32), that is (51), does not depend 
continuously on such perturbations, we demonstrate in this section that a systematic 
bias arises in the EnKBF. Furthermore, we show how the bias can be eliminated 
either via subsampling the data, which effectively amounts to ignoring these 
second-order contributions, or via an appropriate correction term, which ensures 
a continuous dependence on observations X © with respect to the uniform norm 
topology. More specifically, we investigate the impact of a possible discrepancy 
between the SDE model (1), for which we aim to estimate the parameter 0, and 
the data generating SDE (2). We therefore replace (2) by the following two-scale 
SDE [17]: 
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(©) ses 

dx = Ax© a+” MP® dt, (58a) 
1 + 
dP® = Tyan dt + dwy', (58b) 
where 
1 Æ 

M= 59 
(1) a 


B = 2 and e = 0.01. The dimension of state space is d = 2 throughout this section. 
While we restrict here to the simple two-scale model (58), similar scenarios can 
arise from deterministic fast-slow systems [24, 7]. 

The associated EnKBF mean-field equations in the parameter ©;, which we now 
denote by a in order to explicitly record its dependence on the scale parameter 
e€ < 1, become 


do = 7 OF aare, (60a) 
ar =ax© — 5 (2° J {O t01) AX dt, (60b) 

with variance 
of = 1) Q {10| (61) 


and OF ~ n° . The discrete-time mean-field EnKBF (20) turns into 


n+1 ~~ t+ 


with Kalman gain 
© — .©(ax©yT Ora yTy yO)! 
Ki =o, (AX) (y + Ato, (AXi) AX, . (63) 


We also consider the appropriately modified scheme (30): 


(€) 


t+ 1 
o9 =o 4 zf (AX®yTax — 
th 


KO AXO (Of? +2{9[6]) Ar. 
(64) 
In order to understand the impact of the modified data generating process on the 
two mean-field EnKBF formulations (62) and (64), respectively, we follow [17] and 


n+ 


investigate the difference between X (0 and X tT. 
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Fig. 1 SDE driven by mathematical vs. physical Brownian motion (€ = 0.01). The top panel 
displays both X l (blue) and X ©) (red) over the long time interval £ € [0, 10], while the lower 
panel provides a zoomed in perspective over the interval t € [0, 1] 


1/2 
d(x — xt) = A(X® — xhar+ l mPp ar — yawj (65a) 


= A(x — xhar— y Pa PL. (65b) 


When P£ is stationary, it is Gaussian with mean zero and covariance 


— € 
Stat ies D eel =e(M + MD! = S, (66) 


Hence P{® — 0 as e > 0 and also 
x > x! (67) 


in L? uniformly in t, provided øo (A) C C_ and x = xi. This is illustrated in 
Fig. 1. 
In order to investigate the problem further, we study the integral 


po 
tnstn+1 


tn+1 
= f (AX Tax (68) 


tn 


and its relation to (48). As for (48), we can rewrite (68) as 


yt AT: XO, (69) 


tn tn+1 


pO 


tn tn+1 


= AT (XP @x® 


tnstn+1 


We now investigate the limit of the second-order iterated integral 


th tn+ 


t+ 
xo = f xX gax} (10a) 
t, 


n 
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1 © © Le 38 
= x© 9X i f (x, ax®] (10b) 
$ 


tnstn+1 th tn+1 
2 2 Jt, 


as € — 0 [17]. Here [., .] denotes the commutator defined by (54). 
) 


Proposition 2 The second-order iterated integral xe nu SAtisfies 
i Aty 
ha I = Xi, tn+1 + 2 M (71) 


Proof The proof follows [17] and can be summarised as follows: 


tn+1 
XPa = f xX, @ dX,? (72a) 
th 
tn+1 + i 1/2 tn+1 © E 
=> / X;,4@dxX, —y f Xp @ dP, (72b) 
tn tn 
fn+1 
= a r ia OPE ty? f dx @P a29 
th 
1 tn+1 1/2 
> Xa t ed, fax + rme) 9 Pdr (72d) 
n+ tn € 
; At 
> Xİ p + = M Esu | PE @ PO] (72e) 
+ Aty 
= Xin tM. (726) 


As discussed in detail in [9] already, Proposition 2 implies that the scheme (64) does 
not, in general, converge to the scheme (64) as € — 0 since 


= lim J, -2 oA eM, (73) 


+ 
Jinta <>0 tnstn+1 2 


This observation suggests the following modification 


(€) tn+1 At 

o9 =e +% J (Axe Tax ® — = (14a) 
1 
sKOAXO (00 + rfO) At (74b) 


to (64). Please note that it follows from (70) that 
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tn+1 1 tn+1 
| (AXP ax = AT: Ce @ Xia — 5 f bows ax/°1) . 
tn tn 
(15) 


Proposition 3 The discrete-time EnKBF (62) converges to (20) for fixed At as € > 
0. Similarly, (74) converges to (30) under the same limit. 


Proof The first statement follows from of = On, the limiting behaviour (67), and 


lim K© = Ky. (76) 
e>0 


The second statement additionally requires (73) to be substituted into (74) when 
taking the limit € — 0. 


Remark 6 The analogous adaptation of (74) to the gradient descent formulation 
(19) with X replaced by X © becomes 


tn+1 At 
o9 =0 + ; a ( J (a x® Tax ® — PAT :M — (17a) 
tn 
oO (AXO)TAX At). (17b) 


Alternatively, subsampling the data can be applied which leads to the simpler 
formulation 


go = - 


() 4 
n+1 ~~ =0; 


tn+1 


(Ax)7 (x2, - ax} = 6, AX; Ar) A (78) 


Remark 7 A two-scale SDE, closely related to (58), has been investigated in [8] in 
terms of the time integrated autocorrelation function of PO and modified stochastic 
integrals. In our case, the modified quadrature rule, here denoted by ¢, has to satisfy 


t+ tn+1 
tn tn 


and it is therefore related to the standard Itô integral via 


ty 


tn+1 tnt 
Al odes T A 
(AXt)T odx? = TAX Taxi + 5 “TAT: M, (80) 
tn tn 


Hence M playes the role of the integrated autocorrelation function of pO in 
our approach. We note that the modified quadrature rule reduces to the standard 
Stratonovitch integral if either 6 = 0 in (59) or A is symmetric. While the results 
from [8] could, therefore, also be used as a starting point for discussing the induced 
estimation bias, practical implementations would still require knowledge of the 


Frequentist Perspective on Estimation Using the EnKF 253 


integrated autocorrelation function of Po or, equivalently, the estimation of M in 
addition to observing X (9, We address this aspect next. 


The numerical implementation of (74) requires an estimator for the generally 
unknown M in (73). This task is challenging as we only have access to x® 
without any explicit knowledge of the underlying generating process (58). While 
the estimator proposed in [9] is based on the idea of subsampling the data, the 
frequentist perspective taken in this note suggests the alternative estimator Mest 
defined by 


Z Me = EIXO, J, (81) 


2 tn,tn+1 


which follows from (72f) and (52). That is, PER pal = O(At®) for At 


sufficiently small. Note that second-order iterated integral X os satisfies (70) and 
is therefore easy to compute. In practice, the frequentist expectation value can be 
replaced by an approximation along a given single observation path X 19 t €e [0, T], 
under the assumption of ergodicity. 

An appropriate choice of the outer or sub-sampling step-size At [27] constitutes 
an important aspect for the practical implementation of the EnKBF formulation (62) 
for finite values of € > 0 [26]. Consistency of the second-order iterated integrals 
[13] implies 

Xx = Xl ale Xl 4 x) Q xo (82) 


tnstn+2 fnstn+1 fn+1:tn+2 tnstn+1 th+i.tn42° 


A sensible choice of At is dictated by 


nAi ES 


Q x) 


t+i.tn+2 


A |=004?), (83) 
that is, the sub-sampled data X = behaves to leading order like solution increments 
from the reference model (2) at scale At independent of the specific value of e. Note 
that, on the other hand, 


at bes Q x®© 


TI, TI+1 Tl+1,T1+2 


| = O(e7! Ar?) (84) 


for an inner step-size At ~ e€. In other words, a suitable step-size At > 0 can be 
defined by making 


Q x) 


tn+1 | 


h(At) := At? 


= 
— 
>< 
~ 
fay 
a 


| (85) 


as small as possible while still guaranteeing an accurate numerical approximation 
in (62). 
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Remark 8 The choice of the outer time step At is less critical for the EnKBF 
formulation (74) since it does not rely on sub-sampling the data and is robust 
with regard to perturbations in the data provided the appropriate M is explicitly 
available or has been estimated from the available data using (81). Furthermore, if 
A is symmetric, then it follows from (75) and the skew-symmetry of the commutator 
[., .] that 


tn+1 
f eO = A: (XO @ x? Ve (86) 
th 


tn+1/2 tn,tn+1 


which can be used in (74). The same simplification arises when M is symmetric. 
This insight is at the heart of the geometric rough path approach followed in [9] 
and which starts from the Stratonovich formulation (25) of the EnKBF. See also 
[28] on the convergence of Wong-Zakai approximations for stochastic differential 
equations. In all other cases, a more refined numerical approximation of the data- 
driven integral in (74) is necessary; such as, for example, (31). For that reason, we 
rely on the Itô/Euler-Maruyama interpretation of (68) in this note instead, that is the 
approximation (12). 


5 Numerical Example 


We consider the linear SDE (2) with y = 1 and 


-1/1-1 
a3 (13) (87) 


We find that C = I and ATA = 1/2I. Hence (ATA) : C = 1, and the posterior 
variance simply satisfies o; = o9/(1 + oot) according to (44). We set mprior = 0 
and Oprior = 4 for the Gaussian prior distribution of @o, and the observation interval 
is [0, T] with T = 6. We find that or = 0.16. Solving (39) for given o, with initial 
condition mo = 0 yields 

Ot 


peta (88) 


00 


and mr = 0.96. The corresponding curves are displayed in red in Fig. 2. 

We implement the EnKBF schemes (20) and (30) with t = n At. The inner 
time-step is At = 1074 while At = 0.06, that is, L = 600. We repeat the 
experiment N = 10* times and compare the outcome with the predicted mean value 
of mr = 0.96 and the posterior variance of oy = 0.16 in Fig. 2. The differences 
in the computed time evolutions of m; and p; are rather minor and support the 
idea that it is not necessary to assimilate continuous-time data beyond At. We 
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Fig. 2 (a-b) Frequentist mean, m; and variance, p;, from EnKBF implementation (20) with step- 
size At = 0.06; (c—d) Same results from EnKBF implementation (30) with inner time-step At = 
At/600. We also display the curves arising for o; and m; from the standard Kalman theory using 
the approximation (22). Note that the posterior variance, o;, should provide an upper bound on the 
frequentist uncertainty py 
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Fig. 3 Same experimental setting as in Fig. 2 but with the data now generated from the multi-scale 
SDE (58). Again, subsampling the data in intervals of At = 0.06 and high-frequency assimilation 
with step-size At = 1074 lead to very similar results in terms of their frequentist means and 
variances 


also find that the simple prediction (88), based on standard Kalman filter theory, 
is not very accurate for this low-dimensional problem (d = 2). The corresponding 
approximation for o, provides, however, a good upper bound for p;. 

We now replace the data generating SDE model (2) by the multi-scale formula- 
tion (58) with € = 0.01 and $ = 2. This parameter choice agrees with the one used 
in [9]. We again find that assimilating the data at the slow time-scale At = 0.06 
leads to very similar results obtained from an assimilation at the fast time-scale 
At = 1074 with the EnKBF formulation (74), provided the correction term 
resulting from the second-order iterated integral (73) is included (See Fig. 3). We 
also verified numerically that At = 0.06 constitutes a nearly optimal step-size in the 
sense of making (85) sufficiently small while maintaining numerical accuracy. For 
example, reducing the outer step-size to At = 0.02 leads to h(0.02) —h(0.06) ~ 10 
in (85). 
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6 Conclusions 


In this follow-up note to [9], we have investigated the impact of subsampling and/or 
high-frequency data assimilation on the corresponding conditional mean estimators, 
Hr, both for data generated from the standard SDE model and a modified multi-scale 
SDE. A frequentist analysis supports the basic finding that both approaches lead to 
comparable results provided that the systematic biases due to different second-order 
iterated integrals are properly accounted for. While the EnKBF is relatively easy to 
analyse and a full rough path approach can be avoided, extending these results to 
the nonlinear feedback particle filter [26, 9] will prove more challenging. Extensions 
to systems without a strong scale separation [4, 31] and applications to geophysical 
fluid dynamics [22, 12] are also of interest. In this context, the approximation quality 
of the proposed estimator (81) and the choice of the step-size At following (85) (and 
potentially Ar) will be of particular interest. Finally, while we have investigated the 
univariate parameter estimation problem, a semi-parametric parametrisation of the 
drift term f in (1), such as random feature maps [21], lead to high-dimensional 
parameter estimation problems and their statistics [19, 20]. This provides another 
fertile direction for future research. 
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Random Ocean Swell-Rays: A Stochastic A 
Framework TAM 


Valentin Resseguier, Erwan Hascoët, and Bertrand Chapron 


1 Introduction 


Originating from distant storms, swell systems radiate across all ocean basins 
(Snodgrass et al., 1966; Collard et al., 2009; Ardhuin et al., 2009). Far from their 
sources, emerging surface waves have low steepness characteristics, with very 
slow amplitude variations. Swell propagation then closely follows principles of 
geometrical optics, i.e. the eikonal approximation to the wave equation, with a 
constant wave period along geodesics, when following a wave packet at its group 
velocity. The phase averaged evolution of quasi-linear wave fields is then dominated 
by interactions with underlying current and/or topography changes (Phillips, 1977). 
Comparable to the propagation of light in a slowly varying medium, over many 
wavelengths, cumulative effects can lead to refraction, i.e. change of the direction of 
propagation of a given wave packet, so that it departs from its initial ray-propagation 
direction. This opens the possibility of using surface swell systems as probes to 
estimate turbulence along their propagating path. 
For a single progressive swell wave train, a description of the form 


h(x, t) = a(x, Hild, (1) 
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is locally possible for most wave properties, i.e. the surface elevation, slope, orbital 
velocities. If the wave-ray propagation is to be followed, or predicted, the phase, 
(x,t), must vary smoothly along the wave’s path. Mathematically, (x,t) is 
required to be differentiable, to define the relative frequency 


w = —0;¢(x,t), (2) 
and the wave number vector 
k = Vọ(x,t). (3) 


These partial derivatives of (x, t) being independent of the differentiation order, 
the kinematical conservation equation for the density of waves writes 


— Vw = ik, (4) 
with the irrotational condition 
Vxk=0, (5) 


to serve as an initial condition for use with Kelvin’s circulation theorem. The rate 
of change of the wave-number is balanced by the convergence of the frequency, the 
number of wave crests passing a fixed point. 

Let us now consider an ocean moving with velocity v, slowly varying with 
respect to time and space. The frequency of wave crests passing a fixed point, i.e. 
the apparent frequency, becomes 


o=aotov-k, (6) 


with w = f(k,H), H the depth, the intrinsic frequency, whose functional 
dependence on k is known. For gravity waves, this dispersion relationship is 


wo = y g8\|A|| tanh ||k]| H, (7) 
and thus 
Ərk + OpwmoVk + Op@oVH +1-vV|ki| + ||KI|V@-v) =0, (8) 


with / is a unit vector in the direction of k and k = ||k||. Consequently, for a steady 
wave train, the variation of the wave-number magnitude along the propagation s is 


ðslikl| = — (cg +1 + v)“ [ðH @09sH + |ik|lðs (l + v)], (9) 


with cg = kwo, the local group velocity. Using the irrotational condition, the 
evolution of the ray direction, 0 (s), follows 
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1 
0:0 = —(cg HE'E girna + dy - v)], (10) 


where v is unit vector normal to the direction of the ray. Accordingly, wave 
trajectories will bend with depth variations. For deep water, the dispersion rela- 
tionship reduces to wọ = ,/g||K||, and 0(s) solely depends upon the ratio between 
the cross-ray current gradient and the local group velocity. More generally, this 
result extends to the ray curvature, being to first order controlled by ¢/cg, the 
ratio between ¢ = V x v, the vertical component of the current vorticity, and 
Cg = OW = w/2\|K\|, the group velocity. Accordingly, the rays will bend in the 
direction of decreasing (increasing) current speed. Moreover, a potential velocity 
field will give little refraction. Yet, a potential velocity field will control the variation 
of the wave-number magnitude, and thus the group velocity and bending, along the 
propagation. 

To specify the local linear wave propagation, a precise knowledge of the surface 
currents, local gradients and/or vorticity, thus appears essential. In a realistic numer- 
ical setting, Ardhuin et al. (2017) clearly demonstrated that wave energy variations 
would largely be dominated by the effects of ocean currents at scales of about 
10-100km. From altimeter ocean surface wave energy measurements, Quilfen 
and Chapron (2019) also showed that mesoscale and sub-mesoscale upper ocean 
circulation can drive a significant part of the wave variability in the coupled ocean- 
atmosphere system. Unfortunately, these small-scale currents are not observed 
and certainly not resolved in operational models. Today, a precise spatio-temporal 
information is thus largely missing. To overcome these observation difficulties, but 
to best take into account unresolved small-scale currents, a stochastic framework 
can be adopted. Such a stochastic model shall then provide means to perform fast 
simulations and test ensembles of wave-propagation predictions, to best evaluate 
impacts of underlying near-surface small-scale currents on the evolution of ocean 
surface swell systems. 


2 Random Swell-Rays 


To first order in wave steepness, the group velocity vg is modified by the local 
velocity of the currents v, 


dx 
— =v, = Vko = Viwo(k) +0, (11) 
dt ponte! 


Group velocity 
without currents 
but changing wave vector 


where x is the centroid of a wave group. The ray direction can thus differ from 
the direction of the wave vector, except in the case of parallel wave and current 
directions. Unlike depth refraction, the crest alignment does not indicate the wave 
propagation direction. The coupled wave vector evolution writes 


262 V. Resseguier et al. 


ee —Vo'k. (12) 

dt 
Along the propagation ray, velocity gradients induce linear variations. Decelerating 
currents will shorten waves, and thus reduce the group velocity. The validity of this 
coupled ray approximation largely depends on the condition ||k||E >> 1, where £ is 
a length scale on which the current field is varying, physically corresponding to the 
typical eddy size. This condition is well satisfied for wave numbers of interest, of 
order ||k|| ~ 27/250 rad.m~!, and typical eddy size £ ~ 5km or larger. Scattering 
of the waves by currents can further be assumed to be weak, with ||v|| of order 
0.5 m/s, much smaller than ||v,|| of order 10 m/s. Subsequently, each ray will be 
appreciably deflected, with scattering angle of order ~||v||/||v,|| after traveling a 
typical correlation length ~é along the mean wave vector direction. 

To complete the wave field description, the wave action A(x, t) is considered to 
be an adiabatic invariant. Wave action is crucial to anticipate wave transformations 
by currents (White and Fornberg, 1998). This action is the integral of the action 
spectrum N (x, k, t) over all the wave-vectors k: 


A(x,t) = for N(x, k, t). (13) 
The wave action spectrum N is the action by unit of surface (unit of x) and by unit 
of wave-vector surface (unit of k). For linear waves, the wave action spectrum is 


simply related to the wave energy spectrum E: 


E(x, k, t) = N(x, k, t) wo(k). (14) 


By the Liouville theorem, the (x, k) space does not contract nor dilate along 
time! Since the dissipation is neglected, the wave action spectrum N is thus 
conserved (Lavrenov, 2013), i.e. 


N (x (ti), klti), ti) = N (x(t), k(tp), tf), (15) 


along the following (x, k) variable change between initial time t; and the final time 


tp: 
x (tj) x(ty) 
i ği Ea. = 
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Subsequently, each Fourier mode of a swell wave train can be modified, indepen- 
dently of the others. In absence of source terms, the action spectrum conservation 
(15) then writes: 


dN 
= = 3N + vg -VaN + (—Vxv"k) NEN =O. (17) 


3 The Time-Decorrelation Assumption 


Now, the Eulerian current v is decomposed into a large-scale component v and a 
small-scale unresolved component v’: 


v=v+0’. (18) 


In a stochastic framework, we can work with the Stratonovich notations 
(Oksendal, 1998; Kunita, 1997). Under Stratonovich calculus rules, expressions 
become similar to deterministic ones. The Stratonovich dispersion relation is 
analogous to the deterministic one (6). The method of characteristics is also valid, 
(11), (12), and (15), with v’ defined by ø od B; /dt, where dB; /dt is a spatio-temporal 
white noise and øo denotes a spatial filter which encodes spatial correlations and 
horizontal incompressibility (V -o = 0). For a spatially stationary and isotropic 
small-scale velocity, the wave characteristic dynamics equations (11), (12) and (15) 
would then also remain the same with Ito notations (i.e. we can replace ø o dB, by 
odB, to derive the evolution). With Ito notations, the action spectrum conservation 
(17) writes 


aN + 0g + VyN + (-v..v"k) -VN = i : (D pa N), (19) 
k k 


where v, and v include the random small-scale component v’ = odB;/dt, and 


Ep 1 x odB; odB; a 
D= zi =A ea | i eN 


Compared to (17), a RHS diffusive term appears, likely acting to increase the initial 
directional spread of the incident very directional swell components. 

Voronovich (1991) and White and Fornberg (1998) discussed the joint random 
evolution changes of the coupled (x, k), i.e. the location and the wave vector of 
waves, subject to a random current v. Considering the wave train to undergo slow 
changes over the typical time to travel through the typical correlation length of the 
underlying current, the joint time evolution of (x, k) can be approximated to be 
driven by a diffusion Markov process. 
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3.1 The Ray Lagrangian Correlation Time 


To apply (19), the covariance of the small-scale unresolved component v’ — in the 
wave group frame — is thus to be assessed: 


y(t) =E (vC, X 0) -0' +t, XC +D) = wt, X-'40)—-X- (0), 
(21) 


where yy is the (Eulerian) spatio-temporal covariance of v’, assuming statistical 
homogeneity, and stationarity for v’. Assume a typical isotropic form for this 
covariance: 


It] (lx) 
Yi (t,x) = (£ +—}, (22) 
Ty’ ly 
then, 
t X,t +t) — X, C 1 v 
yt =y (hg ECHIN (C1, el t oe), 
Ty’ ly Ty ly 
(23) 
Rede 1, leet 7! es Ae 
for small time increment t. Therefore, (4 + al!) is the correlation time of 


v(t, X,;(t)). The same derivation is valid for Viv’) (t, X,(t)). Over deep ocean, 


ser a 0 1 
the swell wave group velocity is |v l| = ||Vwoll = 35 TT and the along-ray 


correlation time of the small-scale velocity can be approximated by ly / lv? ||. The 
ratio € between this along-ray correlation time and the characteristic time of the 
wave group properties evolution, will then control the time decorrelation assumption 
of v’: 


Ly 
e= —_|Vo" |. (24) 
lve ll 


Note the Eulerian small-scale velocity v’ is not necessarily time uncorrelated. Yet, 
for small enough e€, the Lagrangian small-scale velocity along the ray can be 
considered time uncorrelated. From the expression of €, such a condition depends 
upon: 


- | v? ||, increasing with the square root of the wave-group wave number. Hence, € 
decreases with the square root of the wave-group wave-length. 

— ly, defined by the separation between large scales v and small scales v’, e.g. the 
spatial filtering cutoff of the large-scale velocity v. 

— ||Vv" || — which is different from || V (v’ ie || —, related to the overall kinetic energy 
(KE) and its high-wavenumber spectral slope. 
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3.2 Ray Absolute Diffusivity 


The absolute diffusivity (or Kubo-type formula) usually corresponds, in the so- 
called diffusive regime, to the variance per unit of time of a fluid particle Lagrangian 
path ax = v. It is approximately equal to the velocity variance times its correlation 
time. The Eulerian velocity covariance (22) will thus induce an absolute diffusivity 


= f ” at yot X(t! +0) — XG) ~ yO) ty. (25) 
0 


Here, a wave group is followed along its propagation, and a ray absolute diffusivity 
slightly differs from the usual absolute diffusivity to become 


00 —1 
aea arysn~ (a+ eL) yOx— yO). (26) 
0 


Ty! ly || v2 || 


In the Fourier space, the current Absolute Diffusivity Spectral Densisty (ADSD) 
(Resseguier et al., 2020) associated with the wave dynamics is defined by 


1/k 


A*’ (k) = ——— E 
Ilve (k*") |] 


x(k), (27) 


where k” denotes the wave wave-vector, k the current wave number and Eg the 
current kinetic energy spectra. Accordingly, for noise calibration, we assume A% 
self-similar and we choose a divergence-free spatial filter Vw, such that v’ = 


odB,/dt = VW, x dB,/dt and ||odB,(k)||2/dt = |k vo (k)|? = AX’ (k). 


3.3 A Practical Estimation 


To simplify (20), let us consider the solution for an homogeneous and isotropic 
small-scale velocity v = odB,/dt = Vivo x dB,/dt and Matérn stream function 
covariance, (Wo * Wo), leading to 


È {(oaB,)(o<B,)" | 00 ] 
== = (28) 
2d 0 d 
' o 2 i kikj E (Preas earnp"]] 
moro 
= , (29) 


KETTES) 
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where ag = aa ||o dB; ||? and Cku = on ©||Vx(odB;,)! ||? are constants depending 
on both the correlation length and the spectrum slope of the small-scale velocity. 
The Ito action spectrum equation (19) then reads: 


BN + vg -VaN +(—Vx0Tk) -VaN 


= Vy » (}a0Ys N) + Vz- (bee [e +3kt («+)"| van) , (30) 


1 


1 1 
= 540ÁxN + 5 
oo anes Ikl 


dlk (1x1? N) + 35 Cen 06, N- (31) 


The ensemble mean then follows: 


{EN +g -VEN + (—Vxi7k) - VEN 


1 
IIx 


= Jao AKEN + Zeen mo pny (MKN ak EN) + 3404/99, EN, (32) 


This last RHS diffusion term along the ray-direction 0 is then reminiscent to Eq. 
3.16 in Boas and Young (2020) and Eq. 36 in Smit and Janssen (2019) derived 
under the same isotropic and homogeneous turbulence assumptions. 


4 Numerical Simulations 


To illustrate our purpose, we consider the Surface Quasi-Geostrophic dynamics 
(Pierrehumbert, 1994; Lapeyre, 2017), abbreviated SQG: 


b b 
(ðs +v- V) (-3) =0 with v = vsgg = —V-(—A) |? (->) . (33) 


Note, real-upper-ocean currents may not strictly follow SQG. Still, after a wind 
burst, it can be a good approximation at many mid-latitude locations. SQG 
corresponds to dynamics with extreme locality, i.e a KE spectrum with a shallow 
slope —5/3. Hence, for fixed KE value, a larger current gradient || Vv! || is expected. 
The validity of the time-decorrelation assumption of Sect. 3 will then depend upon 
the scale separation, defining the correlation length of the unresolved scales. 

A reference simulation is obtained at a resolution 512 x 512 for a 1000-km 
squared domain, through a pseudo-spectral code (Resseguier et al., 2017, 2020). 

Once initialized, the current velocity v is about 0.1 m.s—!, 

A swell system enters the southern boundary, propagating to the north. The 
carrier incident wave has a wave length à = 250 m. Its envelope is Gaussian 
with an isotropic spatial extension of 30A. Figure 1 illustrates the branched regime 


Random Ocean Swell-Rays 


Fig. 1 Swell interacting with 
a high-resolution (512 x 512) 
deterministic SQG current. 
The left panel shows ray 
trajectories computed by 
forward advection and 
superimposed on the current 
vorticity œ = V+ -v. The 
right panel shows 
bidirectional wave spectra, 
computed by backward 
advection, at 8 locations 
along a meridional axis (the 
mean wave propagation 
direction) 
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in this homogeneous SQG turbulence. This regime spreads the positions (left 
panel) and wavevectors (right panel) of the incoming waves. From south to north, 
spectral diffusion occurs (right panel), in the direction orthogonal (here kx) to 
the propagation (here ky). This accelerates — along the propagation — the zonal 
wave position spread, to create the branched regime visible in the left panel. This 
acceleration is explained by the ray equation (11) dominated by the intrinsic wave 
group velocity V pwo = col k. 

To mimic a badly resolved v, the current v is smoothed at a resolution 32 x 32. 
Wave dynamics, using this coarse-scale current, are obtained Fig. 2. The branched 
regime is strongly weakened, i.e. the spectral small-scale turbulence diffusion is 
missing. 

A stochastic current is then added to this coarse deterministic one. That stochastic 
component is divergence-free and has a self-similar distribution of energy across 
spatial scales. Its precise parametrisation is a modification of the ADSD calibration 
(Resseguier et al., 2020) (see Sect. 3.2). Figure 3 displays the wave simulations. 
This white-in-time model appears to work for a sufficiently well-resolved large- 
scale current. Indeed, the decorrelation ratio € = (lw /|| v? IDIIV v? || depends on this 
resolution through ly. Specifically, for this SQG flow, the large-scale current v needs 
to be resolved at least on a 32 x 32 grid, i.e. with a resolution ly = 31.3km. As 
such, we obtain € = 3.23 x 107? (computed with 1/||Vv7 || = 1.38 x 10°s and 
Cg = 10m). 


5 Conclusion 


The presence of velocity variations results in random scattering of swell-wave rays. 
Interactions are weak, but cumulative effects can become significant, to increase 
the average path length taken by the swell energy to reach an observer. Nowadays, 
sufficiently precise measurements can then open the possibility to use along-ray 
measurements to probe the near-surface ocean turbulence. Under a Lagrangian 
time-decorrelation assumption and using geometrical optics, a practical stochastic 
framework helps express these scattering effects on the mean swell-action statistics, 
directly in terms of the KE spectrum of the unresolved surface current field. Results 
are presented in both Lagrangian and Eulerian forms, where the latter augments 
the initial radiative transport equation with a diffusive term in directional space. 
Measured delays in swell arrivals, estimated wave height spectral characteristics 
and decays, and/or varying directional spread of the swell field shall then be more 
quantitatively interpreted to infer regional and seasonal upper ocean dynamical 
properties. 
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Fig. 2 Swell interacting with 
a low-resolution (32 x 32) 
deterministic SQG current. 
The left panel shows ray 
trajectories computed by 
forward advection and 
superimposed on the 
low-resolution current 
vorticity © = V+ -v. The 
right panel shows 
bidirectional wave spectra, 
computed by backward 
advection, at 8 locations 
along a meridional axis (the 
mean wave propagation 
direction) 
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Fig. 3 Swell interacting with 
a low-resolution (32 x 32) 
deterministic SQG current 
plus (one realization of) the 
time-uncorrelated stochastic 
model. Ray trajectories are 
computed by forward 
advection and superimposed 
on the low-resolution current 
vorticity © = Vt -7 
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Modified (Hyper-) Viscosity for N 
Coarse-Resolution Ocean Models ches or 


Louis Thiry, Long Li, and Etienne Mémin 


Abstract We present a simple parameterization for coarse-resolution ocean 
models. To replace computationally expensive high-resolution ocean models, we 
develop a computationally cheap parameterization for coarse-resolution models 
based solely on the modification of the viscosity term in advection equations. 
It is meant to reproduce the mean quantities like pressure, velocity, or vorticity 
computed from a high-resolution reference solution or using observations. We 
test this new parameterization on a double-gyre quasi-geostrophic model in the 
eddy-permitting regime. Our results show that the proposed scheme improves 
significantly the energy statistics and the intrinsic variability on the coarse mesh. 
This method shall serve as a deterministic basis model for coarse-resolution 
stochastic parameterizations in future works. 


1 Introduction 


Ocean general circulation models used at climatic scales are limited for evident 
computational reasons to too coarse horizontal resolutions to solve correctly ocean 
mesoscale and sub-mesoscale eddies, even with large computational infrastructure. 
The horizontal resolution of the most recent climatic ocean models is of the order 
of the Rossby radius of deformation. These models are hence in the so-called eddy- 
permitting regime and they can solve partially the mesoscale (i.e. 10-100 km) eddy 
field. These models however suffer from strong limitations. In particular, they are 
unable to reproduce accurately large-scale structures such as the eastward turbulent 
jet in an idealized double-gyre configuration. 

Recent parameterizations have shown significant improvements in coarse- 
resolution models compared to high-resolution reference solutions [2]. However, it 
remains an important topic of research, as the actual generation of parametrizations 
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is not completely able to resolve the effects of the unresolved scales on the large- 
scale flow structures. 

A wide range of subgrid parametrizations relies on eddy viscosity such as 
Laplacian and biharmonic schemes [16, 10, 4, 3]. It has been shown in [9] that 
including only these (hyper)viscosity in coarse-resolution models often causes too 
much dissipation and results in an artificial energy sink at large scales. In general, 
even eddy-permitting models are not energetic enough and as a result, the long-time 
average of any coarse model’s variable of interest departs completely from the long- 
time average of high-resolution models subsampled at the same scale. This becomes 
the main motivation of the present work. In particular, we would like to answer the 
following question: how can we reduce the excessive resolved kinetic energy loss 
due to the viscosity while simultaneously ensuring numerical stability? 

We propose a simple affine parameterization of (hyper)viscosity. The 
(bi)laplacian operator A? f is replaced by A? ( f- f ), where f’ is a field of 
same dimension as f that does not depend upon time. We interpret this method 
as a mathematical regularization technique to guide the solutions towards prior 
information. We frame f’ as the solution of an optimal control problem to reproduce 
statistics computed from a reference solution or observations. We present a method 
to solve this optimal control problem. 

We test the proposed method with an idealized double-gyre configuration. For 
that purpose, we release with this article a fast, concise, and CPU-GPU portable 
Pytorch implementation of a multi-layer quasi-geostrophic model on a rectangular 
domain. We implement and test our optimization procedure within this setting. 

This article is organized as follows: we present in Sect. 2 the double gyre quasi- 
geostrophic model we use and detail its implementation, we present in Sect. 3 our 
modified viscosity parameterization and we show and discuss numerical results in 
Sect. 4. 


2 Double Gyre Quasi-Geostrophic Model 


2.1 Governing Equations 


We use the same multi-layer quasi-geostrophic model in a non-periodic rectangular 
domain as in [6]. Here, we only give a brief review of this system. The quasi- 
geostrophic pressure and potential vorticity (PV) are stacked in three isopycnal 
layers. We adopt vector forms to denote the layered pressure and potential vorticity 
(PV): 
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The forced and damped quasi-geostrophic (QG) equations can be then written as 


1 1 

aq = —J(q, p) + foBe+ — (aA — a4A?)(Ap), (1) 
fo fo 

(A — fZA)p = foa— fob — yo), (2) 


where A = a, + 32, denotes the horizontal Laplacian, A? the bi-laplacian operator, 
J(a, b) = dxadyb — ðxbðya stands for the Jacobi operator, fo + (y — yo) is the 
Coriolis parameter under beta-plane approximation with the meridional axis center 
yo, a2 and a4 are the Laplacian and biharmonic viscosity coefficients. Parameters 
of the configuration are listed in the Tables A.1 and A.2 in Appendix. Besides, the 
second term on the right-hand side of Eq. (1) represents the external forcing applied 
on different layers. In this work, we only consider an idealized case in which the 
ocean basin is driven by a stationary and symmetric wind stress tT = (t*, tT”) on 
the surface and by a linear Ekman stress at the bottom. In that case, the forcing term 
can be specified by 


1 -l 0 0 IxT? — dyt* 

Hi Hi (0) —cos(27y/L 
B= 0 m m a į e= 0 i r=n| a ep 

0 0 RR sk Ap3 


2| fol 


where To is the magnitude of surface wind, Hx is the background thickness of layer 
k, and dex is the bottom Ekman layer thickness. The vertical stratification level of 
such a model is described by the term — fo Ap in Eq. (2) with 


a 0 


where gg+o.5 is the reduced gravity defined across the interface between layers k 
and k + 1. A multi-layered generalization of this model can be found in [5]. Note 
also that such a multi-layered model can be considered as a vertical discretized 
approximation of the continuously stratified QG system [17] with 3; (foðzp/N?3) ~ 
— fo Ap approximated by finite differences, and in which N denotes the buoyancy 
(or Brunt- Vaisala) frequency. 
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2.2 Pytorch Implementation 


To facilitate numerical developments and benefit from built-in automatic differenti- 
ation, we develop a Pytorch [12] implementation of the above-described multilayer 
QG model.! For this purpose, we follow rigorously the strategy of [7]: 


1. we use a regular numerical grid with finite differences 

2. We solve the PV advection equation (1) on the whole domain except the 
boundaries. We use a standard 5-point finite difference scheme for the (bi- 
)Laplacian and the energy-enstrophy conservative Arakawa-Lamb scheme for 
the Jacobian [1]. 

3. We apply a vertical change of coordinate to Eq.(2) which becomes a set of 
three inhomogeneous Helmholtz equations. We solve these equations with the 
spectral Discrete Sine Transform (DST) method, and we add corresponding 
homogeneous Helmholtz equation solutions to ensure mass conservation. 

4. We update the boundary values of the potential vorticity q using Eq. (2). 


Detailed equations and numerical routine design choices can be found in [7]. We use 
a Heun—Runge—Kutta 2 time-stepping instead of the Leap-Frog time scheme used 
by [7]. 

For sake of numerical efficiency, we follow the recommendation of [14]: 
we compile computationally demanding routines and simplify finite difference 
calculations by reducing as much as possible the number of multiplications. We end 
up with a very concise code (less than 300 lines) that only depends upon Numpy 
and Pytorch libraries. This implementation will be open-sourced at the time of the 
publication. 


2.3 Eddy-Resolving and Eddy-Permitting Regimes 


We consider two spatial settings for our simulations: 


1. The eddy-resolving regime, our high-resolution reference with a 5 km resolution. 
2. The eddy-permitting regime, our low-resolution setting with a 40 km resolution. 


Parameters for these two different regimes are written in Table A.2 in Appendix. 

Shevchenko and Berloff [15] studied the resulting flows’ differences between 
these two regimes. The high-resolution eddy-resolving model shows a well- 
pronounced eastward jet fuelled by mesoscale eddies circulating while the 
low-resolution eddy-permitting model does not induce a proper eastward jet as 
shown on Fig. 1. Temporal statistics significantly differ between high- and low- 
resolution simulations. 


l Available at https://github.com/louity/qgm_pytorch. 
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Fig. 1 (Top) high-resolution and (bottom) low-resolution top-layer snapshots after 400 years of 
integration starting from zero velocity. Velocities are in m s7 1 and PV in s7! 


3 Proposed Modified Viscosity 


3.1 Motivation 


In both resolutions, we use biharmonic viscosity as in [16, 10, 4, 3] essentially 
because it is less dissipating at large scales than a Laplacian. Compared to the usual 
Laplacian viscosity, it preserves large-scale structures. However, hyperviscosity 
remains much too dissipative in the “eddy-permitting” regime [9]. This too strong 
dissipation kills the eastward jet that is present in the high-resolution and that we 
expect to see in such a double-gyre quasi-geostrophic model. Figure 2 shows a 
sequence of snapshots of the low-resolution models where we input a downsampled 
snapshot of the high-resolution (see Appendix for details on downsampling). After 
as few as three years, the eastward jet has almost disappeared, showing that the 
model is too dissipating. Lowering the hyper-viscosity coefficient by a factor of 10 
does not solve this problem, and creates spurious gradients in the potential vorticity 
as shown in Fig. 2. These numerical artifacts are due to a bad representation of the 
direct enstrophy cascade, causing a piling up of the small-scale vorticity gradients 
at the cut-off frequency together with aliasing effects. 
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Fig. 2 (left) Initial condition: high-resolution snapshot on the low-resolution grid.(center and 
right) Zonal velocity and potential vorticity (PV) snapshots after 3 years of integration at low- 
resolution with Eqs. (1, 2) with (top) standard hyper-viscosity and (bottom) 10 times smaller 
hyper-viscosity. We can see aliasing effects on potential vorticity snapshots integrated with low 
hyper-viscosity 


3.2 Modified Viscosity 


Here we propose a simple affine modification parameterization of hyperviscosity. 
We add a bias to the term Ap in Eq. (1), which becomes A (p — p’) where p’ is a 
dimensional field that does not depend upon time. The PV advection equation with 
hyperviscosity becomes 


aq = gig. p) + foBe+ =k — a4A*) (A(p— p’)). (3) 
fo fo 


The elliptic equation (2) remains unchanged. 

The goal of this additional term is to reproduce a relevant time-average pressure 
field relying on observations or high-resolution solutions. For example the high- 
resolution average ppr can be downsampled to the targeted coarse grid resolution 
in por |, and we want the average of the modified low-resolution ptr model to be 
as close as possible to the high-resolution reference pyr |. 

We face here an optimal control problem, as the low-resolution average is a 
function of the control parameter p’. We state it with the following least-square 
formulation 
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Poot = argmin F (p’) (4) 
p’ 
F(p’) = | pir (P) — par J |” (5) 


This optimization problem is a priori non-convex and we shall not expect to find 
a global optimum. In the following, we propose a numerical procedure to find a 
heuristic p’ of the optimal solution Popt: 

Computationally, the implementation of this modified hyperviscosity is simple 
and computationally cheap. We precompute Ap’ and subtract it from Ap at each 
time-integration step. It increases the integration time of the advection equation (1) 
by less than 1% on CPUs and GPUs. 


3.3 Modified Viscosity Regularization 


The continuously stratified QG equations can be rewritten in a variational formula- 
tion [8] with a Hamiltonian J defined as 


-1f Lyp Mya? 
TW) = 5 f IPP + 5 (ep) 


Our model is a discretized version of the continuous stratification. Since we add an 
external wind forcing term and we use an energy conservative Arakawa advection 
scheme, we need to add some viscosity or hyperviscosity to dissipate energy. 
In a variational formulation, these (hyper-)viscous terms become the following 
penalization 


1 2 2 
= | a2|Apl’ + a4 |V (Ap)|*, 
2 Ja 


added to the Hamiltonian (p) to produce a smooth solution. The Gradient norm 
penalization of Laplacian p guides the minimization toward solutions of smooth 
Laplacian. Hyperviscosity corresponds to the Laplacian norm penalization and 
enforces a solution of minimum Laplacian norm. The parameters a2 and a4 quantify 
the strength of these regularization constraints. 

Here, we simply propose to replace it with the following penalization 


1 
5 eaP- po? +alv (a-p) l 


We now penalize (p — p’) instead of p, meaning that we guide the solution to a 
possibly non-smooth reference p’ that will produce the correct large scale behavior. 
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3.4 Iterative Procedure 


Here we present a method to find a solution to the optimization problem (4). A 
natural guess for Poot is par 4. We solve the equations and compute the average 
pressure pcr. Results are shown in Fig. 4. It is a good first-guess, but the difference 
PER 4 —P cpr is still large. 

We propose the following iterative procedure to find a better guess for Popt: In 
the following we assume that we are in low resolution, i.e. p = pLR and p = pir 
unless explicitly written. 


e We set pọ and we compute the average pressure Po solving standard equations 
(1, 2) without modified viscosity. 

e Choose k € ]0, 1]. 

e Start with p| = par J. 

e Evolve the ensemble for n years and compute the corresponding average pressure 
pı with ensemble average. 

e Forn=1...: 


— Set Phi al Ph +k (PHR 1 —p,,). 
— Evolve the ensemble for n years and compute new average pressure p,,41. 


e return pi, and P, 


There is no theoretical guarantee that this procedure converges, but we observe 
in the next section that it converges with the double-gyre QG model that we use. 


4 Results and Discussion 


4.1 Statistics 


We use ensemble averages to compute the statistics. To create ensembles of size N, 
we start from a zero solution and spin up the models for 100 years with a timestep 
of 1200s to reach statistically steady states as in [13]. Then we run the models for 
500 years and save 10 snapshots a year to get 5000 snapshots, and we randomly 
select N snapshots out of these 5000 snapshots. The ensemble averages are simply 
average over these N ensemble members that we evolve in parallel. Such ensemble 
averages are denoted with @ in the following, i.e. the average pressure is denoted by 
p, average velocity by u, etc. 
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Relative square errors with iterative procedure 
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Fig. 4 Top-layer average pressure (top) and velocity (bottom) of (left-to-right) proposed model at 
low-resolution, reference, and the difference between the two 


4.2 Iterative Procedure 


We test the iterative procedure described in Sect.4.2 with the double-gyre model 
presented in Sect. 2 in the eddy-permitting regime. We use n = 10 years to evolve 
the ensemble after each iterate. We compute the reference pressure average PHR 
with the same model in the eddy-resolving regime. 

Figure 3 shows the relative square error ||p, —PHR \ I?/IIPAR J ||? at iterations 
of the procedure with k = 1 and k = 0.7. The procedure converges with k = 0.7 
and oscillates with k = 1. 

Figure 4 shows the output average pressure p,, of the iterative procedure, the 
reference Ppp and the difference between the two, as well as for zonal velocity u . 
Our model can reproduce the eastward jet produced by the high-resolution reference 
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Fig. 5 Top-layer kinetic-energy spectra average with models at high-resolution (HR), at low- 
resolution (LR) and at low-resolution with proposed modified viscosity. The decreasing slope of 
the spectrum of the proposed model is much closer to the high-resolution reference 
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Fig. 6 PV and zonal velocity snapshots form (left-to-right) high-resolution, low-resolution and 
proposed model at low-resolution 


model. Kinetic energy spectra shown on Fig. 5 shows also the improvement of our 
model compared to low-resolution. Finally, Fig.6 shows high-resolution and low- 
resolution snapshots as well as a snapshot of the proposed model at low-resolution. 
Our model effectively produces the eastward jet and a re-circulation zone around it 
where eddies are created. Artifacts can be also observed on the zonal velocity and 
potential vorticity on the right of Fig. 6. They can likely be Rossby waves created 
by the harmonic regularization terms, which remain an artificial constraint, but this 
needs to be studied further. 
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5 Conclusion 


We presented a simple modified-viscosity scheme for coarse resolution ocean 
modeling that we derived and tested on a double-gyre multi-layer quasi-geostrophic 
model. We interpret it as a modified regularization technique that will guide the 
solution to a reference rather than producing a too smooth solution in the eddy- 
permitting regime. The technique requires solving an optimization problem, and we 
presented a procedure to find a good guess for the solutions. We showed that it 
converges to a reasonable solution that fairly reproduces the input reference. 

If this method mimics the average of the high-resolution, it only reproduces 
partially the variability and higher-order statistics of the high-resolution. We see 
in Fig. 5 our model’s snapshots resemble the averages. In future works, we consider 
using this method as a deterministic basis for stochastic parameterizations such as 
Location-Uncertainty [11]. 


Appendix 
Downsampling Procedure 


Downsampling the high-resolution solution on a low-resolution grid consists of 
interpolating the high-resolution (769 x 961) streamfunction on the low-resolution 
(97 x 121) grid. Then we can compute the potential vorticity using Eq. (2). Because 
of the no-flow constraint, the downsampled streamfunction should be constant on 
the boundaries and should satisfy a mass conservation constraint [7]. We also want 
to preserve the frequency information and prevent aliasing. 

We use the following procedure: 


1. we apply a Gaussian filter and downsample the streamfunction on the domain 
except on the boundaries. 

2. we adding homogeneous solutions of the Elliptic equation (2) to the streamfunc- 
tion in order to satisfy the mass conservation as in [7]. 
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Parameter Tables 


L. Thiry et al. 


Table A.1 Common parameters for all the models 


Parameters Value Description 

Lx x Ly (3840 x 4800) km Domain size 

Ay (350, 750, 2900) m Mean layer thickness 

gk (0.025, 0.0125) m s7? Reduced grativity 

Sek 2m Bottom Ekman layer thickness 
To 2 x 1075 m? s7? Wind stress magantitude 

a2 0m? s7! Laplacian viscosity coefficient 
fo 9.375 x 1075 s7! Mean Coriolis parameter 

B 1.754 x 107!! (ms)! Coriolis parameter gradient 


Table A.2 Grid-dependent parameters 


Grid dimensions Resolution Timestep Hyperviscosity (a4) 
769 x 961 5km 600 s 2.0 x 10° m4 s7! 
97 x 121 40 km 1200 s 5.0 x 101! mf s7! 
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Primitive Equations Under Location A 
Uncertainty: Analytical Description and oss 
Model Development 


Francesco L. Tucciarone, Etienne Mémin, and Long Li 


Abstract Resolving numerically all the scale interactions of ocean dynamics in a 
high resolution realistic configuration is today far beyond reach, and only large scale 
representations can be afforded. In this work, we study a stochastic parameterization 
of the ocean primitive equations derived within the modelling under location 
uncertainty framework. First numerical assessments built with the NEMO core’s 
code are provided for a double-gyres configuration. 


Keywords Stochastic parametrization - Ocean modelling 


1 Introduction 


The Ocean covers a major part of Earth’s surface and has an important stabilizing 
effect on the climate. For climatic prediction, accurate likely ensemble forecasts 
of future ocean states are consequently essential. However, due to an evident 
computational limitation high resolution simulations are completely unfeasible and 
only large-scale ocean representations can be handled. To face this difficulty, and 
the need of generating different likely future scenarios, there has been a growing 
interest in the geophysical sciences to set up flow models that incorporate in their 
dynamics noise terms related to uncertainties or errors. In accounting for the actions 
of unresolved processes in a random way, these stochastic models are in general 
less diffusive than the classical large-scale deterministic models. The unresolved 
processes include small-scale turbulence effects, boundary value uncertainties or 
uncertainties coming either from scale coarsening or from the numerical schemes 
used. Moreover, compared to classical large-scale deterministic modelling, the 
additional degree of freedom brought by the stochastic component allows us to 
devise new intermediate models [4, 3, 6, 7, 8]. The addition of noise in fluid 
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dynamics models cannot be done in a haphazard manner. Ad-hoc choices for 
model noise can fundamentally perturb the corresponding fluid dynamics models, 
making them exhibit unrealistic properties [3]. Rigorously justified methodologies 
for choosing the model noise have recently been introduced by Mémin [1] and Holm 
[2]. These derivations lead to large classes of stochastic geophysical fluid dynamics 
models that preserve either energy or circulation, respectively. Such models natu- 
rally emerge from a decomposition of the flow velocity field in terms of a smooth 
component and a time uncorrelated uncertainty random term. This decomposition 
is reminiscent, in spirit, of the classical Reynolds decomposition, and enables the 
definition of large-scale representation with a stochastic term representing small- 
scale effects. The Location Uncertainty (LU) formulation has been found to be 
more accurate in structuring the large-scale flow [4] and in reproducing long-terms 
statistics [22] for the barotropic quasi-geostrophic model. It also provides a good 
trade-off between model error representation and ensemble spread [21, 23] for the 
rotating shallow water model and the surface quasi-geostrophic model. In this work 
we explore more specifically a stochastic version of the primitive equations, named 
primitive equations under Location Uncertainty. The derivation of this model is 
detailed and first numerical experiments built from the NEMO code are assessed. 


2 Location Uncertainty (LU) 


In the LU formalism, the Lagrangian displacement X, associated to a fluid particle 
is decomposed as: 


t 


t 
X: (x) = Xp (x) + f e ACE J o (X; (x), s) dBs, (1) 
0 0 


where X: 2 xIR* —> 2 is the fluid flow map, that is the trajectory followed by fluid 
particles starting at initial map X|;—ọ (x) = xo of the bounded domain 2 C R. 
Written in differential form Eq. (1) takes the usual form: 


dX; (xo) = v (X;, t) dt + o (X;, t) dB;. (2) 


The first component, v (X;, t), represents the smooth, resolved velocity field of the 
flow. It corresponds to the integration of the equations of motions, solved on a 
grid of a given resolution, and it is supposed to be both spatially and temporally 
correlated. The second term, ø (X;, t) dB;, is a stochastic process that assembles 
the unresolved flow component, uncertainties on the flow and turbulent effects. This 
stochastic contribution, often referred to as noise in the following, is built from the 
application of an Hilbert-Schmidt kernel integral operator, ø, to an I3—cylindrical 
Wiener process B 
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(o (X;, 1) 4B,)' = Í, Six Xa y, t) dBE (y) dy, 3) 


where B is defined on a filtered probability space {2, F, P, (F;);} and (F;); is 
the filtration adapted to B. The application of the (integrable) kernel ð imposes 
fast/small scales spatial correlation and defines a centered Gaussian process ø dB; ~ 
N (0, Qdt), with covariance tensor defined as 


Qij (x.y, 1,8) = E[(o (x, 1) dB)! (ø (y, s) dB,) | 
= ô(t— s) af Oik (X, Z, t) On; (Z, y, S) dz. 
Q 


The strength of the noise is measured by the diagonal components of the covariance 
tensor per unit of time, i.e. the variance tensor, a, defined as a(x, t)ô(t — t')dt = 
Q(x, x, t, t’). The variance tensor is symmetric and positive definite at any point x 
of the domain. Notably, it has the dimension of a viscosity in m?s~!. The covariance 
operator is self-adjoint, positive definite and compact and admits a convenient 
spectral decomposition. 

In this paper, the noise will always be assumed to be centred, but it can be proven 
through Girsanov theorem that one can redefine the Lagrangian displacement (2) as 


dX; (Xo) = [v (X;, t) — pr (X;)] dt + odB, (Xr), (4) 


where the Wiener process B, is a centred process under a new probability measure 
Q drifted by uz. Indeed a non centred Wiener process shifted by a random process 
(Y;), can be defined as: 


t 
B=, + f Y, ds. (5) 
0 


Under good properties of (Y), (F;-measurability, almost sure L?—integrability and 
Novikov condition) there exists a measure Q such that (B;); is a Q— Wiener process 
With the non centred random process B, we can rewrite the equations with respect 
to B; as 


o dB; (X;) = o dB; (X;) — o (X;, t) Y, dt. (6) 


Denoting o (X;, t) Y, as p; one can write the Lagrangian displacement (2) as (4) 
and under Q the Wiener process dB; is centred thus the writing of dX; has the 
same form as (2) but under a new measure. All the arguments provided in the 
following will hold for this process under Q. The use of a drifted noise odb, is 
fundamental when the processes employed to operationally define the noise are not 
centred, hence displaying a non-zero time average. 
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3 Stochastic Transport Theorem 


The derivation of Eulerian flow dynamics models within the LU formalism relies on 
a stochastic version of the Reynolds transport theorem (SRTT), introduced in [1], 
which describes the rate of change of a random scalar q transported by the stochastic 
flow (2) within a flow volume V;: 


af q (x, t) dx = I {Dig + qV- [v* dt + odB,]} (x, t) dx, (7) 
V, v, 
with the operator 
` 1 
Dig = dq + [v* dt + odB;] -Vq — v (aVq) dt, (8) 


defining the stochastic transport operator. The SRTT is in perfect analogy with the 
deterministic Reynolds transport theorem (compare with [13] section 5.3), and the 
various terms can be interpreted physically. Proceeding in order, the first right-hand 
side term of (8) is the increment in time at a fixed location of the process q, that 
is dg = q (X;, t + dt) — q (X;, t). This contribution plays the role of the partial 
time derivative for a process that is not time differentiable. The term enclosed in the 
square brackets is a stochastic advection displacement. It involves a time correlated 
modified advection, 


1 
v =v- {Vea ove), (9) 


and a fast evolving, time uncorrelated noise odB;. The advection by this term of 
variable q leads to a multiplicative noise, which is hence non Gaussian. This type 
of noise is often denoted as transport noise in the literature. The second term of the 
modified advection is coined as the /to-Stokes drift velocity in [4], Vs = 5V-a. 
It represents an effective transport velocity resulting from statistical effects due 
to inhomogeneities of the noise term. The last term of the transport operator is a 
dissipation term that depicts the mixing mechanism due to the unresolved scales. 
Following [5] one can consider the transport of a characteristic function to introduce 
an evolution equation for the Jacobian determinant J of the flow: 


D,J — JV- |(v — v +07 (V-o)) dt + odB,| = 0. (10) 
This equation provides a clear condition for the stochastic flow to be isochoric: 


V- [v* dt + odB,] = 0. (11) 
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4 Boussinesq Equations 


Under location uncertainty, a stratified ocean can be modelled with a modified 
version of Boussinesq equations. The derivation that is outlined here follows almost 
verbatim the asymptotic derivation given in [12]. First, one applies the SRTT (7) to 
the density and imposes conservation, that is d f v, P (x, t) dx = 0. Then, assuming 
that the fluctuations of density are small compared to the mean, 


p x,t) = po [1 + e8ô (t,x), (12) 


and using € as an asymptotic ordering parameter to perform an expansion of the 
conservation of mass, the first order is found to be: 


V- [v* dr + odB,] = 0, (13) 


that can be split in two incompressibility conditions involving both the modified drift 
velocity v* and the fast scale component ø dB; thanks to the uniqueness of semi- 
martingale decomposition [15]. Applying again the SRTT (7) to the momentum 
reads 


pDiv = -V (p— EV-v) dr — V (dp?) — pges dt, (14) 


where the right hand side entails pressure forces, compressibility effects [14] and 
gravitational forces. The compressibility term EV: v, with u dynamical viscosity of 
water, is usually neglected in the deterministic derivation of the Boussinesq model, 
but in this model is maintained in view of the different incompressibility condi- 
tion (12), that enforces V-v = V-vs. Following classical nondimensionalization 
procedure [12, 14], characteristic scales are introduced as: 

poU? . U?, 


x = Lx, v= U, t=tt, p P, g = — 8, (15) 
€ eL 


with t = L/U advective time scale. Furthermore, the variance tensor is assumed to 
scale as a = Aa so that the fast-evolving component o dB, and the kernel o can be 
scaled as 


AL, a 
odB, = 7 ôdB, and o = VAG. (16) 


In this novel framework a non-dimensional parameter Y = UL/A is introduced 
to compare advection and stochastic diffusion terms in the momentum equation. 
This parameter is termed stochastic Peclet number, in perfect similarity with 
the deterministic advection-diffusion problem [10]. Introducing these variables, 
following [12], one obtains: 
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sia ANa ee ee 
po (1 + €5/) {as + I(* = zt) df + pine | -VV 
eo oe ee 1l 

= 9 (ate) ai} = (-25+ 


o 
—V ET — po (1 + eôô) Eez di. 
€ 


Expanding each variable as an asymptotic with €e taken as ordering parameter, 
Eq. (17) provides at lowest order, once dimensional variables are replaced to non- 
dimensional variables, 


V po = —p0gez, Po (Z) = —pogz. (18) 
Decomposing the density into a background constant density and a deviation, 


corresponds on the pressure variable to a decomposition in terms of a hydrostatic 
component and a pressure fluctuation. This splitting, 


p (t,x) = po +p (t,x), p(t, x) = pot p' (t,x), (19) 
allows the recognition of the first order component of the pressure as the deviation 


from the hydrostatic pressure p’, so that Eq. (17) at first order in dimensional form 
becomes 


1 
div + [(v — v*) dt +odB;]-Vv — zy (aVv) dt = 


d o / 
= v(-p'+ =V-ws) dt v( n ) ; ge, dt. 
0 0 


The splitting (19) also introduces naturally the buoyancy b = —ge3p’ (t,x) /po in 
the equations of motions, representing the upward (or downward) force associated 
with the density anomaly p’. In terms of buoyancy, the momentum equation can be 
written as 
,_ ap? |v 
Dw=¥(-p - BE ov.) dt — bdt. (20) 


A stochastic transport equation can be written for the buoyancy from mass 
conservation. However, in this work a tracer transport equation on salinity, S, and 
temperature, T, is preferred, relating then the buoyancy and the tracers with a 
buoyancy state equation b = b(T,S,z). The conservation of a given tracer 0 is 
expressed as 


D,6 + 9V- [(v — vs) dt + odB,] = F? dt + D? dt, (21) 
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where the variation of tracer quantity is balanced by a forcing term F° and a 
diffusive term D’. We note that here these terms are assumed to be regular in time, 
although additional Brownian terms could be considered to encode intermittent 
forcing. The resulting system, split into horizontal and vertical equations using the 
convention v = (u, w), is: 


Horizontal momentum: 


l 
Du + fez x (war r 5748; ) =V, (-p' + =Vev) dt — Vdp? (22) 


Vertical momentum: 


ð V dG 
Diw = — (-p + =V-y) dt — —dp? + bdt (23) 
az 3 Oz 
Temperature and salinity: 
D;T = xr AT dt, (24) 
D,S = xsAS dt, (25) 
Incompressibility: 
V-[v—vw]=0, V-odB; = 0, (26) 


Equation of state: 


b=b(T,S,z). (27) 


Temperature and salinity are introduced as active tracers, as they modify the 
buoyancy field, and their stochastic evolution is obtained again by application 
of the SRTT (7), balanced with a diffusion process with diffusivity kr and ks 
respectively. The unusual coefficient 1/2 in the random Coriolis term can be 
shown to appear naturally from a derivation of the non-inertial acceleration in this 
stochastic framework, again following the derivation of [12]. Metric terms relative 
to the rotation of the earth should also be adapted to the stochastic Frenet-Serret 
formula dC = Q@dt x C in the case of planetary scale simulations. In Eqs. (22) and 
(23) the stochastic pressure is introduced, and corresponds to a zero-mean turbulent 
pressure related to the small scale velocity component (i.e. noise). It is a martingale 
term. An operational model referred to as the primitive equations can be obtained 
through the so-called hydrostatic balance, resulting from neglecting the vertical 
acceleration terms through a proper scaling of the velocity. In our stochastic setting, 
the vertical momentum equation reads, after neglecting the large scale acceleration 
terms and for moderate noise (Y ~ © (1) so as the martingale terms related to the 
vertical velocity component are negligible): 
/ o 

= CETE and Pap; 

az az 


= 0, (28) 
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where the bounded variation terms and the martingale terms have been safely 
separated. The left equation constitutes the usual hydrostatic balance. With the 
scaling used, the stochastic pressure is constant along depth and is in balance 
with the stochastic Coriolis component [9, 5]. These two martingale terms can be 
removed then from the horizontal momentum equation. In this setting the vertical 
component of the momentum equation becomes a diagnostic component that can 
be recovered integrating the continuity equation given by (26). In a similar way, 
the large scale pressure is obtained from the vertical integration of the hydrostatic 
relation. The scaling parameter Y can also be related to the ratio between the Mean 
Kinetic Energy (TKE) when an advective time scale is used, that is 


U? 1MKE 
= = (29) 

A/t e€TKE 
where € = T,/T, is the ratio of the fast-scale to the slow-scale correlation times. 
This ratio can be adapted to the different variables involved (i.e. momentum, 
temperature or salinity) with a value similar to the inverse of the Schmidt number 
(ratio of diffusion rates) making hence the noise scaling parameter, Y, dependant 
on the variable transported. The parameter Y appears in dimensional analysis and 
asymptotic expansions, but plays also a paramount role in the quantification of the 
strength of the noise. 


5 Methods 


The experiments are performed with the level-coordinate free-surface primitive 
equation ocean model NEMO [16]. The domain configuration is a double-gyre 
configuration consisting of a 45° rotated beta plane centred at ~ 30°N, 3180 km 
long, 2120km wide and 4km deep. The domain is bounded by vertical walls and 
a flat bottom. The seasonally varying wind and buoyancy forcings induce a strong 
jet to appear diagonally in the domain, separating a warm sub-tropical gyre from a 
cold sub-polar gyre. Three experiments were performed: two purely deterministic 
simulations at different resolutions, 1/27° (R27d) and 1/3° (R3d), and one stochastic 
simulation at 1/3° (R3LU). Each simulation was run for 10 years with data collected 
every (and averaged over) 5 days. The focus of this paper is to assess the benefits 
brought by LU to the coarse simulation, so the parameters of the simulation were 
chosen following thoroughly [17, 18] (see Table | for an overview of their values). 
In this first study, we restrict ourselves to 3D divergence-free horizontal noise (i.e. 
with no vertical component). In spectral form the random field and the variance 
tensor can be written as: 


odB, = XA; piap, a=) hipit), (30) 
ieN ieN 
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Table 1 Parameters of the model experiments 


R27d R3d R3LU 
Horizontal resolution 1/27° (3.9km) 1/3° (35.3 km) 1/3° (35.3 km) 
Horizontal grid points 540x810 60x90 60x90 
Vertical levels 30 30 30 
Time step 5min 20 min 20 min 
Eddy viscosity —5x 107° m4s—! —107!? m4s7! —107!? m4s7! 
Eddy diffusivity —5x107!9 m4s7! 300 m’s~! 300 m?s~! 


where {gj (x), i € N} are the orthonormal eigenfunctions of the covariance operator 
associated to {A;,i € N}, the (real, positive) eigenvalues ranged in decreasing 
value order and { Bi „i €e N} is a set of standard (scalar) Brownian variables. 
This representation corresponds to the Karhunen-Loeve decomposition [24]. Oper- 
ationally, the (finite) set of eigenfunctions {@;(x),i € [1, N]} and of eigenvalues 
{Ai,i € [1, N]} are computed through a proper orthogonal decomposition (POD) 
[11] of the temporal fluctuations of the two-dimensional low resolution residual u, g- 
This velocity residual is obtained through Gaussian filtering of the high resolution 
deterministic simulation R27d, u, = (1 — G) Uy, with the fluctuations computed 
through Reynolds decomposition: 


N 
ui. X,t) = Uj, (X, t) — Up (X, T = 5 Qi (x) a; (t). (31) 


i=] 


The POD procedure applied to u (x,t) provides a set {ġ;(x),i € [1, N]} of 
eigenfunctions that are stationary in time and such that 


(bm, on) = j) PT On (x) dx = ômn, Aman = Am6m,n- (32) 
2 


The eigenfunctions are used to define the random field and a stationary variance 
tensor as 


M(z) M(z) 


odB, (x) =) a; AVA, = a(x) = D> AAPG; (WOT (®) 83) 


i=1 i=l 


where 9; = biv At and M(z) <« WN chosen to provide at least 85% of the 
energy of the fluid layer. Due to the constraint posed by Eq. (26) on the noise, 
incompressibility on the horizontal noise is imposed by applying a Helmoltz-Hodge 
decomposition [19] on the each snapshot of the horizontal velocity u, g. Moreover, 
the set of eigenfunctions {@;(x),i € [1, N]} is used to construct the drift w; of 
Eq. (4) in such a way that the distance between u; and uw, is minimized, that is 
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N 
W: = Dgo) yi with y= = arg min |u k u, (X, t) D od (x) yi ; (34) 


i=l i=l 2 


Due to the orthogonality of the basis functions the coefficients can be easily 
recovered as the orthogonal projection yi = (Up (x, t) A ġix)). 


6 Results 


In this work we focus on the results of a single realisation. From a qualitative point 
of view, the effect of the coarsening of the resolution can be seen in Figs. 1 and 2, 
where the leftmost panel represents the result the R27d simulation, the central panel 
shows the results of the R3d simulation and the rightmost panel shows the R3LU 
simulation. The first noticeable characteristic of the R27d reference simulation is 
the presence of a primary jet stream inclined at an almost —45° angle starting at 
the bottom-left corner and directed towards the centre, and a secondary, smaller jet 
with the same inclination roughly 80 km above the primary. The presence of both 
structures is visible in the reference papers [17, 18]. In both figures the comparison 
between the high resolution and the low resolution deterministic simulation shows a 
degradation of the information about the jet-streams. Figure 1, depicting the relative 


vorticity ¢'°Y = (Əðxv — ðyu) /f “rr shows that the deterministic R3d simulation is 
incapable of reproducing the primary jet characteristic and its positioning, though 
showing an increased activity in place of the secondary jet stream. The stochastic 
R3LU simulation presents instead a intensification of the vortical activity in the 


0.02 


0.0 


-0.02 


Fig. 1 10-years averaged relative vorticity ¢ = (av = dyu) /f at the surface layer of the model 
for deterministic high-resolution (1/27°, left), for deterministic low resolution (1/3°, middle) and 
for stochastic low resolution (1/3°, right) 
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R3d [m] 


a 


0.5 


0.25 


0.0 


-0.25 


Fig. 2 5-days averaged sea surface height of the model for deterministic high-resolution (1/27°, 
left), for deterministic low resolution (1/3°, middle) and for stochastic low resolution (1/3°, right) 
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Fig. 3 Left and centre panels, standard deviation of the kinetic energy. The color scale has 
been adjusted to enhance the differences in the jet region, not considering the highly energetic 
boundaries where peaks present values as 0.2 m?/s? for R3d and 0.17 m?/s? for R3LU. Right 
panel, the Gaussian relative entropy for relative vorticity, ¢, (cold palette) and kinetic energy (warm 
palette). The lighter colors represent the deterministic simulation R3d, the darker colors represent 
the stochastic simulation R3LU. All the statistics are computed over 10 years 


regions of the primary and secondary jet. Considering sea surface height, Fig. 2 
shows that the best result is obtained by the stochastic simulation that, while not 
being able to distinguish the primary jet stream by the smaller vortices of the 
secondary jet, it is capable of reproducing the main behaviour. The left and centre 
panels of Fig.3 shows the difference obtained in terms of variance of the kinetic 
energy in the two coarse simulations, with greater variability obtained with the 
stochastic model, especially in the area of the jet stream, where a lesser variability is 
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Fig. 4 Vertical profile of temperature after 1 year of simulation (left) and after 10 years (right) 


shown in the deterministic case. From a quantitative point of view, the simulations 
are compared using the Gaussian Relative Entropy described in details in [20] and 
which measures with a single criterion both the mean and variance reconstructions. 
In the left panel of Fig. 3, values of the GRE for two variables, the relative vorticity 
¢, and the kinetic energy KE = (u? + v?) /2 are compared. For two different depths 
and in a vertical average sense (GRE ), the relative entropy is smaller for the 
stochastic simulation, indicating a smaller distance from the distribution given by 
the reference R27d simulation. The proposed stochastic model thus outperforms 
the standard deterministic simulation in terms of both relative entropy and intrinsic 
variability for kinetic energy and vorticity. This behaviour is observed in every layer. 
In the tracers equation the noise has been scaled with the aid of the Schmidt number, 
the ratio between the eddy viscosity and eddy diffusivity. This consideration stems 
from the fact that the correlation times for transport of momentum and of tracers 
are not the same, and the difference can be expressed in terms of the Schmidt 
number. Figure 4 shows the vertical profiles of horizontally-averaged temperature, 
T™” (z,t) = fa T (x,y,z, t) dxdy, at time t = 1Y and + = 10Y for the three 
simulations. The vertically averaged temperature shows an increase in mixing of 
temperature of the stochastic setting with respect to its deterministic counterparts. 
This process has been observed to be sensible to the noise amplitude and might 
be caused by the structure of the noise and by the effects of Helmholtz-Hodge 
decomposition. Further studies to investigate this process with three-dimensional 
and isopycnal noise are ongoing. 
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7 Conclusions 


The considered stochastic model has been implemented into the NEMO dynamical 
core. A 3D horizontal, incompressible noise was considered and has been proven 
to successfully increase the capabilities of a coarse simulation in simulating the 
dynamical quantities of interest, when corrected with a stochastic drift leading 
to a change of probability measure. Both the qualitative behaviour of the jet- 
stream and the quantitative intrinsic variability of the model have been increased. 
Thermodynamic quantities like temperature and salinity seem to not benefit from 
this implementation. In future works, more complex non stationary fully 3D noises 
will be investigated within the same setting. 
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Bridging Koopman Operator and A 
Time-Series Auto-Correlation Based cre | 
Hilbert-Schmidt Operator 


Yicun Zhen, Bertrand Chapron, and Etienne Mémin 


Abstract Given a stationary continuous-time process f(t), the Hilbert-Schmidt 
operator A, can be defined for every finite t. Let A,,; be the eigenvalues of A, 
with descending order. In this article, a Hilbert space Hy and the (time-shift) 
continuous one-parameter semigroup of isometries K* are defined. Let {vj,i € 


OO 
N} be the eigenvectors of K* for all s > 0. Let f = X ai vi + f+ be the 
i=l 


orthogonal decomposition with descending |a;|. We prove that lim àz; = |a; le 
Tow 


The continuous one-parameter semigroup {KC* : s > 0} is equivalent, almost surely, 
to the classical Koopman one-parameter semigroup defined on L?(X, v), if the 


dynamical system is ergodic and has invariant measure v on the phase space X. 


Keywords Singular spectrum analysis - Koopman theory - Hilbert-Schmidt 
theory 


1 Introduction 


Let { f(t) € C : t > 0} be a continuous time process. We assume that f has zero 
temporal mean and the lagged moments exist for all s > 0: 


a r E" 
p(s) = lim = Í FOFE+s)at. (1) 


Define p—s = ps. In [3] the self-adjoint operator A+ is defined to act on L?({0, t]): 
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1 T 
(Arg)(t) = = f ee — as. (2) 


for every g € L*({0, t]), and for all £ € [0, r]. When p € L;.,(R) and p(s) 4 0 
for almost all s € [0,7] , A; is a Hilbert-Schmidt operator. In particular, A, is 
compact and always has a purely punctual spectrum. In other words, the Hilbert 
space L?({0, t]) admits a basis {oj € L?({0, t]) : i € N}, so that each Qi is an 
eigenvetor of A,. This implies a Karhunen—Loéve type of decomposition. Namely 
for any h € L?({0, t]), there exists scalars c; € C, so that: 


h(t) = > cipt), (3) 


l 


for any t € [0, t]. 

As stated in [3], the singular spectrum analysis (SSA) algorithm is based on 
the spectral analysis of A+. Given a finite sequence of discrete-time measurements: 
{f(nAt) :n =0,1,2,..., N + M,and(N + M)At < t}, the (N+ 1) x (N+ 1) 
a discretized version of A, can be approximated by: 


1 


= —— Hyu Hž y, 4 
M+1 NM!INM (4) 


where Hy m is the trajectory matrix defined by 


fO) f(Ath > f(MAt) 


f(At) JĒRA) +--+» f(M+1)At) 
Hym = i 


FNA) F(N + DAt) -+ F(N + M)At) 


and Hy y refers to the conjugate transpose of Hy m. Matrix Hy y can be computed 
numerically whenever a discrete-time time series is available. Intuitively, for t large 
enough and Aż small enough, Cy is a good approximation of A+. The SSA method 
starts with calculating the spectral quantities (i.e. eigenvectors, eigenvalues) of Cy. 
The spectral quantities of A, are the theoretical quantity that the spectral quantities 
of Cy are supposed to represent. 

While in practice the SSA method has been applied successfully to a large variety 
of time series, in a theoretical purpose, yet with practical consequences, one may ask 
ourselves what is the relation between A+, and A+, for different tı and t2? And what 
is the asymptotic behavior of A; as tT — 00? In what way is the spectral property 
of A; related to intrinsic properties of the dynamical system? These questions are 
important because for real world data it is often not possible to get finer sampling 
time At. However, longer time series are sometimes available with long enough 
data. In this article we generalize the idea and tools developed in [4] and apply them 
to study of A+. We shall prove that 
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lim Àr; = lail? (6) 
IM Ari ail; 
T>00 


where A,,; is the i-th largest eigenvalue of A; and a; is the i-th largest (in modulus) 
coefficient of some eigenvector v; (of unit length) of the time-shift operator X° (for 
all s > 0) in the orthogonal decomposition of f: 


f=} avi+ ft, (7) 


i=l 


where f+ denotes the the expression of f in the orthogonal complement of the 
space spanned by the time-shift operator eigenfunctions. If there are only finitely 
many i (say only N terms in the summation) in Eq. (7), then we seta; = Ofori > N. 
The time-shift operator KC’ is closely related to the classical Koopman operator, 
which is defined to act, as a time-shift operator, on some function space whose 
domain is the whole phase space of the dynamical system. 

In Sect. 2 we present the main result and a brief introduction of the mathematical 
tools used by the proof of the main result. All the quantities mentioned above are 
defined rigorously in Sect. 2. The detailed proof of the main result is presented in 
Sect. 3. 


Notes and Comments The main result as well as the techniques and ideas used for 
the proof are close in spirit to those developed in [4]. However, the Hilbert-Schmidt 
operator A, is defined for continuous time process and the theory developed in [4] 
does not cover the continuous-time case. The objective of this paper is to confirm 
that the asymptotic behavior of the Hilbert-Schmidt operator A; is well related to 
Koopman theory. 


2 Preliminaries and the Main Result 


Let { f(t) : t > 0} be a continuous-time process. 
Assumption 1 Assume that 


1 T 
lim — f fat =0, (8) 
0 


T>0o T 


and that p(s) is well-defined by Eq. (1) for all s > 0. 


For any s > 0, we use F, to denote the time series {F;(t) = 
any two time series g = {g(t) : t > O} andh = {h(t) : t > 
time series 


f(t +s): t > O}. For 
0}, we define the new 


ag + bh = {ag (t) + bh(t) : t > 0}, (9) 
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where a, b € C. We consider the following linear space: 
Hp = Spanc{F, : s > 0}. (10) 


Each element h € Hs can be written as 


n 
h= Docks, (11) 
i=l 


for any n > 1,c; € C,s; > 0. The existence of p(s) allows us to define the 
following positive semi-definite Hermitian form: 


1 T: 
(h, 8) = jim = | h(t)a(t)dt. (12) 


Let V = {ve H f i (v, v) = 0}. Since the Hermitian form is positive semi-definite, 
V is a linear subspace of Hp. And the Hermitian form is strictly positive-definite on 
the quotient space H fI V. Hence it defines an inner product on Hy/V. We define 


Hy = Hy/V (13) 


where the closure is taken with respect to the inner product defined above. 
We define the operator X° on Hp for any s, sı > 0: 


KO Fs, = Frits- (14) 
It is obvious that 
(Koh, K'g) = (h, g), (15) 


for any h, g € H f and any s > 0. Hence K* is well-defined on H f/V, and can 
be further extended to the whole Hy by continuity. Therefore we obtain a one 
parameter family of isometric operators K’* that acts on the Hilbert space H ¢. And 
obviously we have 


KS K2 = KS +s2 A (16) 


To simplify the notation, we use f to also denote the continuous-time process Fo. 
We further assume that 
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Assumption 2 
lim |K° f — fll, = 0. (17) 
s—0+t 
In other words, Assumption 2 assumes that the curve: 


y : [0, œ) > Hf 
t> K'f (18) 


is continuous. Since H p is generated by f and K* are isometries for all s > 0, 
Assumption 2 implies that K’ — J in the strong operator topology as s > O*. In 
other words, {X5 : s > 0} forms a strongly continuous semigroup of isometries on 
Hy. 

Under Assumption 2, we have the following decomposition theorem (see 
Theorem 9.3 in [2]). 


Theorem 1 Let {K5 : s > 0} be a strongly continuous semigroup of isometries on 
a Hilbert space H. Then H has the orthogonal decomposition H = Hy Q Hyu, 
where Hy = N KH, and Hyu is isomorphic to L?([0, 00], Ho) for some Hilbert 
s>0 
space Ho. Hy and Hyu are invariant under KS for all s > 0. The operator K5 
restricted on Hy is a strongly continuous semigroup of unitary operators. And K’ 
restricted to Hyu acts as the unilateral shift operator, i.e. for any y € Hyu = 
L*((0, co], Ho), 


KYE = y(t +s) € Ho. (19) 


Theorem | provides us with an useful tool to deal with the completely nonunitary 
component of K5. For the unitary component, we have the following spectral 
representation theorem. 


Theorem 2 Let {U(s) : s > 0} be a strongly continuous semigroup of unitary 
operators on a Hilbert space H. Assume that H can be generated by U and some 
f € H. Then there exists a unitary map ¢ : H —> L?(R, du) where u is some 
positive finite measure on R, such that 


OFE) =1, (20) 
OKD = e*(H(g))(x) (21) 


forallg € H, x € R, ands > 0. 


Theorems | and 2 suggest the orthogonal decomposition Hf = Hf u Q HfNu = 
L? R, dup) @ L?((0, oo], H f,o). Furthermore, we can write yf = Mfd + Mfc» 
where jz¢¢ is a countable sum of Dirac measures and wy is continuous with 
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respect to the Lebesgue measure. uf, can be composed both of an absolutely 
continuous part and a singular continuous part. The decomposition of u f suggests 
the orthogonal decomposition H fy = L7(R, dufa Ð L? (R, du f,c). In sum, we 
have 


f= fynu + fat fe, (22) 


where fyuy € L7([0, 00], Hy0), fa € L? @, dupa), and fe € L?(R, du pfc). Note 
that these subspaces are pair-wise orthogonal and are all invariant under K* for all 
s > 0. The support of u f,a consists of countably many points. Each point x; in the 
support of u f,q corresponds to an eigenvector v; € Hy of K* for all s > 0, i.e. 


1 ifx=x;, 
Gaina = 4 T (23) 


0 otherwise, 


and H pa({xi}) = Jai |, where a;’s are the coefficients of the eigenvectors in the 


following decomposition: 


f=} aivi + fynu + fe. (24) 


We rearrange the index of v; so that |a;| > |a2| > --- > 0. In order to make 
connection with A+, we need the following lemmas. 


Lemma 1 For any t > 0 and any g € L?({0, t]), the following integral 


f g(s)K* fds (25) 
0 


is well-defined and is an element of H f. 


The proof of this and the following lemma use standard argument from mathemati- 
cal analysis and we leave the proof to the interested readers. 
Let 


Hir = f eK fdas:t>0, gE L? ([0, tD} (26) 


Ti is a linear subspace of H f. We have 


Lemma 2 


Hit = Hp. (27) 
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For simplicity, we use the notation I2 := L?({0, t]). Given Lemma 1, for any 
81, 82 € L7({0, t]) andt € [0, t], we define the Hermitian form A, : i xL? > C: 


1 T T 
Aene =| aO far, f WK fas), (28) 


Cauchy-Schwartz inequality implies that 


1 T 2 T 2 
Ar(en ga? s S| f aora |f eo ras] (29) 
TEI Jo Hy" Jo 
1 
< alsz ls2lzlA (30) 


where (, ) 2 refers to the inner product in L2 and (, )7,, refers to the inner product 
in Hy. Therefore Riesz representation theorem warrants that there exists a linear 
bounded operator A; : i > E: so that A+ (g1, 22) = (81, Ar82)12. Consequently, 


1; f° 1 f° 
wD = {S aor yak sr), == f sope-sa, 6D 
0 Hr Tt Jo 


T 


which is the same as the definition of A+ in [3]. Assumption 2 implies that p € 
Le (R). This implies that A; is a Hilbert-Schmidt operator on L. We shall use the 
following variational description of the eigenvalues. 


Proposition 1 (The Min-Max Principle) Let H be a Hilbert space and A a 


Hermitian operator on H. Leth, > à2 => --- be the eigenvalues of A in descending 
order. Then 
v, Av 
ài = max min pe (32) 
MCH veM lull 
dim M=i 


Our main result states that, 
Theorem 3 (Main Result) Under Assumptions 1 and 2, we have, for alli € N 
f (33) 


lim Az i = Jai 
TOO 


where h_,; stands for the eigenvalues of Az. 


The following Proposition [4] demonstrates the correspondence between the 
eigenfrequencies of the continuous-time time-shift operator and the discrete-time 
time-shift operator. Please refer to [4] for the notations in the proposition. 


Proposition 2 Let { f (X;) : t > 0} be a continuous time process for which ps exists 
for alls > 0. Let At > 0 be a time step. Assume that 
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1 f7 = 
jim gf FOF Xnaad 


T/At 
Sme NO fKnad fA J; (34) 
piers T E ; nAt (n+k) At 

Ən=! 


for allk € N. Then Hf => HEM. Let q be an eigenfrequency of the discrete-time 
operator K^, i.e. there exists h € Hf => HP" so that K&h = e'%h. Then there 
exists an integer k, and hx € HP, so that 


- q+2kr 


Khk =e A Shy (35) 


foralls > 0. 


Remark 1 It is worth to point out that the one-parameter semigroup of isometries 
{K* : s > O} is equivalent to the classical Koopman one-parameter semigroup 
{KS : s > 0} which acts on L?(X, dv) almost surely (with respect to the initial 
state of the time series), if the dynamical system is ergodic and has finite invariant 
measure v on the phase space X. Because if f € L?(X, v), then FEF € L! (X, dv) 
and Birkhoff ergodic theorem states that o(s) = v(f Ks f) for almost every initial 
state xo € X. In other words, (f,K*f)yH, = (f, KS f) r2 av): Note that f is 
interpreted as a given time series on the left of the equality and interpreted as a 
function on the right of the equality. This shows that under the assumption that 
the dynamical system is ergodic and (finite) measure-preserving, there is a natural 
isometric bijection from Hp to L?(X, dv). 


For mathematical interests, we present the main result in an abstract mathematical 
form. 


Theorem 4 (Main Result in Mathematical Form) Let H be a Hilbert space and 
{K" : s > O} a strongly continuous one-parameter semigroup of isometries acting 
on H. For any f € H, let f = X aivi + f+, where vi’s are the common 


I 
eigenvectors of K5 for all s > 0, and f+ is the component of f that is orthogonal 
to the eigenspace of K* for all s > 0. Assume that |a,| > |a| > --- > 0. For 
any Tt > 0, let Ay, be the Hermitian operator on L?({0, t]), such that for any 
g € L*((0, t]) and any t € [0, t], 


1 Ẹ 
ADO = f OFK Dads (36) 


Then A fzr is a Hilbert-Schmidt operator and hence has purely punctual spectrum. 
Let À fz, i be the i-th largest eigenvalue of A fı. Then we have 
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lim àfr i = lail. (37) 
TCO 


3 Proof of the Main Result 


CO 
For any fixed small € > 0, we choose k, so that > la; |" < e€. We have the 
i=k+1 
orthogonal decomposition 


k 


f=fat fnu + fe= > awit D> avit fart fru + fe 


i=1 i=k+1 
= fak + faze + fru + fe, (38) 


where fak € Hf,a,k which is the subspace of H fq spanned by {v1,..., vg}, and 
fa, € Hyd, the subspace spanned by the rest of the eigenvectors, fyy € Hf,NU, 
and fe € H fc. Note that Hea, Hpd.e, Hfnu, and H f,c are pairwise orthogonal 
and invariant subspaces of H ¢. Hence following Eq. (28), for any g1, g2 € ioe 


1; f* g 
MiMe uf g0 fas, f go(t)K! fat), 
T T \Jo o ; 
1, f7 7 
=f 81C (Jak + fase + fe + fas, f got (faa + etti No 
=H" gi (s)Ko fa ds f QK fa rar) 
T \Jo ee fa : ay 
1f i i 
4: Hf gi(s)K facas, | 2K aeath, 


1 ý S 7 t 
ots Af gi(s)K fas, | gtk feat), 


1 T E 
+ -(f g0 Swuas, | 820K" fru dt) 
T \Jo 0 Hy 
=(81, Ar.d,k82) 12 + (81, Ar,d,e82) 12 + (81, Ar,c82) 22 + (81, Ar, NU 82) 12> (39) 


in which the definition of Ar.a.4, At,d,e, Ar,c and A+, yu are obvious. It is not hard 
to show that A+,d,k, Ar,d,e, Ar,c and Ar, yy all admit eigendecomposition since they 
are all Hilbert-Schmidt Hermitian operators. Note that the cross product terms all 
as H fiak, H f,d e> H f,c and H fynu are pairwise orthogonal and invariant under K* 
for all s > 0. 

Let Az,,k,is At,d,e,is At,c,i, aNd Àz, yU,i be the i-th largest eigenvalue of Ar d,k, 
Ar,d,e, At,c; Ar, NU respectively. We will prove the following identities: 
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Proposition 3 


lim àraki = lail fori =1,...,k, (40) 
TOO 
Arde.) < € for any T > 0, (41) 
lim Azc.1 = 0, (42) 
TOO 

lim Ar,NU,1 = 0. (43) 
TOO 


Before we start to prove Eqs. (40)-(43), it is not hard to see that Propositions 1 and 
3 directly implies the main result. Indeed, for any fixed n and any € > 0, we can find 


[0,6] 
k so that n < k and > Jai K < e. Then we find t large enough so that A;,..1 < € 
i=k+1 
and Ar,wu,1 < €. Note that Ar,d,k, Ard,e, Ar,c, and Az yu are all positive semi- 
definite. Applying the min-max principle we have 


(v, Ar v) 


Àrn = max min ar a (44) 
McL? veM Iv | 
dim M=n 
. (v, Ar,d,k v) + (v, Axd,e v) + (v, Arc v) + (v, Ar, NU v) 
= max min 5 
Mcr? veM lvl 
dim M =n 
(45) 
v, Å v 
> max min U A = Àr,d,k,n» (46) 
Mcr? veM llv 
dim M=n 
and that 
v, Az V 
Àrn = max min a (47) 
McL? veM llv 
dim M=n 
v, A v 
< max min ee + 2€ = Àr,d,k,n + 2€. (48) 
Mcr? veM llv 
dim M=n 


Combined with Eq. (40), this implies Theorem 3. 


Proof (Equation (40)) Recall from Eq. (23) that each eigenvector v; corresponds 
to a point x; in the support of ua. For any g € i, Theorem 2 states that 
T 


f g(s)K° fa kas has the following representation in L? (R, du), for any x € R, 
0 
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(of g(s)K fads) )(x) = K g(s)ei tids if x = xj for some j. (49) 


, otherwise. 
And 
1 T T 
(8, Ar d k8) L2 = -(f eOK faxas, | BOK" faxat) (50) 
T T\JO 0 Hy 
re T , 2 
aes ISXj q | 51 
T Xl g(s)e 5l L2Ranw) 01) 
1€ T l 2 
= yia I g(s)e ds (52) 
T j= . 0 
Let £; € LẸ so that £; (s) = e'**/ for any s € [0, t]. Then |£; |l}, = t and 
aj 2 
(8, Arak 8) 22 = Die éj, g -Yéh L, g)? (53) 
Let Vr k = Spang {2 mie rsi ae We write g = grk +g}, where grk € Vrk» 
and gt € Vix Then 
bak; 
(g, Arkad 8)12 = > UE, gr). (54) 
j=l vt i 


Note that dim Vz, = k for all t > 0. Direct calculation yields that, for j + £, 


(2È, akey = éj] 


Vt? Ji = UU Taa 
eigenvalues of A;,x,¢ shall approach to the distribution of the eigenvalues of 


— Oas t — œ. Therefore the distribution of the 


la, |? 0. 0 
0 Ja? 0 
; (55) 
0 0 one fag? 


as T —> oo. This completes the proof of Eq. (40). 
Proof (Equation (41)) Similar to Eq. (53), for any g € Le lgliz = 1, we have 
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(8, Arde 8)L D laj P Ej, 8) 12 sS i E, e) nl? (56) 
j=k+1 j=k+1 
(oe) 
< Do lai’ <e. (57) 
j=k+1 


Then the min-max principle implies that àz k,e,1 < €. 


Proof (Equation (42)) Following [1] (page 39—41), we first show that 


1 ft i 
lim -f |u fe(e™)|ds = 0, (58) 
t> T Jo : 
or equivalently 
: 1 i isxy|2 
lim = | |u fele | ds = 0. (59) 
t> T Jo 3 


Equation (58) means that the large moments associated to the continuous spectral 
measure has density zero. For any € > 0, we write M f,c = H f,c,1 + H f,c,e, in which 
U f,c,1 has compact support, H f,c,e (R) < € and wre L Hpf,c,e- Denote the support 
of U f,c,1 by B1. Then we have 


1 fF , 1 rt 
-f lu fele as = -f [ufe (ei) |? ds + — f lufe eli]? ds 
T JO T Jo 


(60) 
z 1 | isx 2 
— e H fc,1(x)| ds +€ (61) 
T JO R 
and that 
1 7 isxy|2 1 ý Isx 2 
-| lures ass=] | | e*aupei(x)|as 
T JO T JO R 
1 f7 P 
=- Í ds J f eO au pei (x)du fe 10) (62) 
T JO RJR 
1 T., 
=- f f Al pe (AM fc, 1) f eads (63) 
T JRJR 0 
1 E roys 
2 f f deaa Í eas (64) 
T Bı JB, 0 


Note that |} he isx—y) ds 


< 1 for any t > 0 and any x, y € R. And when x Æ y 
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1 T 


Since H f,c,1 is continuous, we have that (ufe1 X Hf, DA, y) € R?:x = 
y} = 0. Hence, the integral in Eq. (64) boils down to an integral on R? \ {x = 
y}. Lebesgue’s dominated convergence theorem implies that the integral in Eq. (64) 


eit a-y) 1] 


— (65) 
T—>00 


ti(x — y) 


lf? . 
converges to 0 as t —> oo. Hence lim sup — f |u fele™) las < € for any € > 0. 
t=œ T JO ` 


This implies Eq. (59). 
For any g € L? (R), Theorem 2 implies that 


(0( [ 86)K* facas) )(x) = [ g(s) ds. (66) 
Therefore 
1 T T 
(g, Ane 92 = —( [ g(s)K* facas, f 8OK' faat), (67) 
E 1 ý $ j t 
= HAS OK facas) of BOK facad) a gu 
(68) 
z2 f ~ Apt f,c (x) f ' f * g(s)geO-P* Asar (69) 
T J—oo 0 JO 
=2 I ‘ I PTA (eS) asar (70) 
T Jo JO 
Hence 
1 ft ft i 
Kg, Are 8)| Sz f Í l(t) IO lfe (E67) aras (11) 
1 . 
=- i; Í OIIO lufe (i67 atas (12) 
T O<s<t<t 
1 . 
ro J [ let) IEO le fe (e67 ards (73) 
<t<s<t 
2 fF i ' 
z f IO f IO lafe (e-P*)aras (74) 
T JO t 
2 T t-t , 
=+ f gol f lg(t+s)|-lupe(e)|dsdat (75) 
T JO 0 


2 tT tT—t 1 A r ns 
<= | I -08O + let + DPH pele!) |dsdt (76) 
tJo Jo 2 
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f7 isxy, [ 2 2 
=: f |U f,c(e aif AO + lgt + s)| )dsadt (77) 
= [ Er Isiz2ds (78) 
Therefore 
Arc] = max (8; Arc8) —> 0, (79) 


geL? IIgllz2 


as T —> oo. This completes the proof of Eq. (42). 


Proof (Equation (43)) Recall that Hpvu = L?({0, +00], Ho). Hence fyy can 
be represented as a curve from [0, 00] to Ho. We denote this curve by y. Without 
ambiguity, we do not distinguish between y and fyuy. Hence for each t > 0, y(t) € 


[0,0] 
Ho. And IV acevo = Í lly Olh dt. Recall that (K*y)(t) = y(t + s). We set 
i 0 


y(t) = 0 for all t < 0. Hence for any g € i, 


1 T T 
(8, Ar, NU 8)12 = f g6DK" yası, | g(s2)K"yasa) (80) 
T TAJO 0 Hynu 
1 lo) T T 
=: f / f 8(s2)g(si)(y Œ + s1), Y(t + s2))Hods1ds2dt (81) 
0 0 JO 
1 T T [00] 
=: f / zegen f (y (t +51), y Œ + s2))Hodtdsıds2 (82) 
0 JO 0 
We first show the following identity: 
[0,6] 
. 5 <4 E 
dm (v, YH snu = im f (y(t), yE + s)ìHodt = 0. (83) 


To prove Eq. (83), without loss of generality we assume that |ly |H; yy = 1. For 
any € > 0, there exists Ne, so that J j lly Olyat > | — e. This means that 


In Iy (t)||?at < €. Therefore for any s > Ne, 


f wo ret onal’ <| fo mora |f rorat <e. 
(84) 


This proves Eq. (83). Now we continue with Eq. (82): 


PA T T 7 
(8, Ar NU8)L < |- a(s2)a(si (Ky, KPYYH ¢ yy ds1ds2 (85) 
PIT Jo Js f. 
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p) T T—S] _ . 
|? f [ e(si)asr +S), KY) 24 yu A810 (86) 


For any € > 0, find Me, so that for any |(y, K*y)| < e for any s > Me. Now for 
any t > M,/e and any Ilgllz2 = 1, we have 


(8g, Ar,NU8) 12 (87) 
2 ft fM 

<f [ lg(si)|- |g(s1 + s)|- Cy, Oy) Hp ny |ds1ds+ (88) 
2 T =f 
J f IDI g1 +9) Y, va yy lasds (89) 

0 € 

1 ft [Me ? ; 

<i f f AgI + leisi + OOI, KE YH; yy ldsids+ (90) 
1 T t= 
a! I, (ssp? + g6 +D, K YYHs nu ldsids (91) 


1 T Me 5 1 t Me 4 ; 
<- |g(s1)|-dsyds + — IsGr ts) My, EVH) OD 
T Jo JO T Jo JO 


1 ft p75 
Beers zf / e(le(si)I? + le(s1 +)[2)as1ds (93) 
0 JM: 
Me Me € 
< + +2-(t — Me) < 4e. (94) 
E T 


Therefore for t > Me/€, 


Àt, NU,1 = max (g, År, NUE) < 46. (95) 
geL? 
lgl=1 


This completes the proof of Eq. (43). 
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